Test #2 Preparation Guide | EECS-4411M : Database Management Systems | winter term 2017

EECS-4411M
Database Management Systems
York University
Winter 2017

Test #2: Preparation

Layout

Probably four or five questions, with the layout and types of questions quite similar to Test #1.

exercise

E.g., What is the I/O cost of this query plan?
How many tuples is this query likely to return? (Use assumptions of System R to estimate.)

short answer

E.g., Which join algorithm would be most appropriate for this query?

analysis

E.g., How would we need to modify the 2-pass sort-merge join to accommodate...

Coverage

Test #2 is not intended to be cumulative, and so will not test directly on the subjects covered up to Test #1, the physical database, per se. We have covered the second third of the course, query processing.

So not covered:

physical database
- disk space manager
- buffer pool manager
- file, page, and record formats
- internals of index data structures
  - B+ trees
  - extendable and linear hashs

That said, everything we have done since Test #1 builds on the material from before. So in that sense, you are responsible for the previous material for this test, by necessity.

Test #2 covers query processing: Ch.15 & 16. In general, the questions will be drawn from the intersection of the readings and lectures in class. However, general concept questions drawn from the readings are also fair game.

indexes (needed for access paths) [Ch.15]
- prefix matching
- cost model for using
- clustered vs unclustered
- ...
external sorting [Ch.15 §7]
- basics Many other relational algorithms are based on this, and have the same performance issues.
- advanced issues (e.g., double buffering, block reads and writes, etc.)
query evaluation, overview (Ch.12) [Ch.15 §1, Ch.16 §1,2]
1. access paths
2. cost model & System R
  1. reduction factors
  2. cost estimation
3. query trees & alternative plans
  1. pipelining
  2. optimization "rules" like pushing selects
  3. ...
evaluating the relational operators [Ch.15]
1. select & access paths
2. project with distinct
3. join
  1. nested loops join
    1. block nested loops join
    2. index nested loops join
  2. sort-merge join
    1. naīve: sort first, then merge
    2. 2-pass: accomplishes join in two passes
  3. hash join (two pass)
4. the set operations
5. aggregation
relational optimizer [Ch.16]
As covered in our overview in class; have gone through Ch.16 very generally, so far. Nothing advanced from Ch.16.
1. System R
2. left-join trees
3. pipelining
4. general about the join-enumeration algorithm

Policies

The test will be closed-note, closed-book. You may bring a calculator, but the test will be designed so that any math shouldn't need one.

There will be space on the text packet for writing answers. I will bring extra paper, in case anyone needs it, to attach.

The test will be for the class lecture time, so 75 minutes.

Parke Godfrey