One and Two Pass Query Algorithms




CS157b

Chris Pollett

Mar 2, 2020

Outline

Introduction

Model of Computation for Physical Operators

Parameters for Measuring Cost

Quiz

Which of the following statements is true?

  1. Grid files can be used for neighborhood lookup queries such as find all points in a rectangle.
  2. Any k-d tree is a multi-key index (as the latter was described in class).
  3. Quad-trees are typically used to provide an index for 3D data.

I/O Cost for Scan Operators

Aside - Sorting in Secondary Storage

Bottom up Merge Sort

Two Phase Multiway Merge-Sort (TPMMS)

N-Way Merge sort

Phase I Example

Phase II

Phase II -cont'd (finish sorting in secondary storage aside)

How much more efficient is this last idea than just merging two blocks?

Iterators for Implementing Physical Operators

One Pass Algorithms -- Tuple at a Time Operations

One-Pass Algorithms for Unary, Full-Relation Operations

One Pass Algorithms for Binary Operations

Recall `M =` number of memory blocks; `B(T)` = number of blocks in table `T`.

More One Pass Algorithms for Binary Operations

Nested Loop Joins

More Nested Loop Join

Two Pass Algorithms Based On Sorting

Duplicate Elimination using Sorting

Given `R` we sort `R` and output distinct values.

Sort Based Duplicate Elimination

Grouping and Aggregation using Sorting

Sorting and Unions, Intersections, etc.

Sort-based Join