Information Integration - OLAP




CS157b

Chris Pollett

May 4, 2020

Outline

Information Integration

Use Cases of Information Integration

The Heterogeneity Problem

Three Information Integration Techniques

Below are three common techniques for integrating data source, we will go into them in more detail after the quiz.

  1. Federated Databases. Here we have multiple independent data sources, but can use one source to call on the others to supply info.
  2. Warehousing. Copies of data from several sources are stored in a single database called a warehouse. This data has usually been processed to a common schema, filtered, and aggregated. The warehouse is usually updated periodically (say once a day at off peak times).
  3. Mediation. This is a software component that supports a virtual database, which the user may query. It stores no data itself, but instead, queries its sources and synthesizes from them a response.

Quiz

Which of the following statements is true?

  1. A typical distributed database might involve a shared memory architecture where the network speed is fast as compared to disk I/O.
  2. One of the steps in computing a distributed join using the semi-join algorithm from class involved doing the two phase commit protocol.
  3. One way to avoid lock bottlenecks in the distributed setting is for each transaction to have a lock coordinator and to use primary copy locking to handle replication issues.

Federated Database Systems

Example Federated Database Architecture

Data Warehouses

Example Data Warehouse Architecture

Warehouses Example

Mediators

Example Mediator Mechanics

Mediator Example

Mediator Wrappers - Templates

Mediator Wrappers - Filters

Global as View Optimization

Online Analytical Processing

Example OLAP Application

Multi Dimensional View of OLAP data

Car dealer example of multidimensional data

Approaches to OLAP