Midterm #2 Prep, CSE-4411(A), F-2009, York University

CSE-4411(A)
Database Management Systems
York University
Fall 2009

Test #2: Preparation

Layout

Probably four or five questions. I am aiming to make it shorter than midterm #1 so there is not as much time pressure. The type of questions and layout of the test will be similar the first midterm.

exercise

E.g., What is the I/O cost of this query plan?
How many tuples is this query likely to return? (Use assumptions of System R to estimate.)

short answer

E.g., Which join algorithm would be most appropriate for this query?

analysis

E.g., How would we need to modify the 2-pass sort-merge join to accommodate...

Coverage

Midterm #2 is not intended to be cumulative, and so will not test directly on the subjects covered up to Midterm #1, the physical database, per se.

So not covered:

physical database
- disk space manager
- buffer pool manager
- file, page, and record formats
- internals of index data structures
  - B+ trees
  - extendable and linear hashs
- basics of the external sort algorithm

That said, everything we have done since Midterm #1 builds on the material from before. So in that sense, you are responsible for the previous material for this exam, by necessity.

indexes (needed for access paths) (Ch.8) REVIEW
- prefix matching
- cost model for using
- clustered vs unclustered
- ...
external sorting (Ch.13)
- basics REVIEW
  Many other relational algorithms are based on this, and have the same performance issues.
- advanced issues (e.g., double buffering, block reads and writes, etc.)
query evaluation, overview (Ch.12)
1. access paths
2. cost model & System R
  1. reduction factors
  2. cost estimation
3. query trees & alternative plans
  1. pipelining
  2. optimization "rules" like pushing selects
  3. ...
evaluating the relational operators (Ch.14)
1. select & access paths
2. project with distinct
3. join
  1. nested loops join
    1. block nested loops join
    2. index nested loops join
  2. sort-merge join
    1. naīve: sort first, then merge
    2. 2-pass: accomplishes join in two passes
  3. hash join (two pass)
4. the set operations
5. aggregation
relational optimizer (Ch.12&14)
Just preliminaries, as covered in Chapter 12, but not as discussed in the recent classes from Chapter 15. Nothing advanced.
1. System R
2. left-join trees
3. pipelining

Only topics covered in the reading and in class are fair game. I shall try to be careful not to emphasize anything in depth from the optimizer itself. However, topics from the optimizer overview in Chapter 14 are fair game. You should know the basic concepts.

Policies

The test will be open-note, open-book. This is a change from test #1! You may bring a calculator, but the test will be designed so that any math shouldn't need one.

There will be space on the text packet for writing answers. I will bring extra paper, in case anyone needs it, to attach.

The test will be for the class lecture time, so 75 minutes.

Parke Godfrey