Advanced Database Systems
CSE 6421
Fall 2012
York University


Semester: Fall 2012
Course/Sect#: CSE 6421
Time: Tu, Th 1-2:30 PM
Location: Ross 156 (on Sep 18); Lassonde Bldg. 2002 thereafter
Instructor: Jarek Gryz
Office: 2049 CSB
Office Hours: Tu, Th 11:30-12pm
and by appointment
Ph#: 416-736-2100 x70150
e-mail: [my first name]@cs.yorku.ca

Final exam will cover the following sections of the textbook: 8, 9, 10, 11, 12, 13, 14, 15, 22.1-22.10, 25 (only material covered in lectures), 26 (only material covered in lectures). You are also responsible for additional material covered in lectures (e.g. semantic query optimization).

Final exam is due (by email or hardcopy) on Monday, Dec 17 at 9AM.


Materials

Welcome to the Advanced Database Systems course, CSE-6421 for Fall term 2012. Materials, instructions, and notices for the course will accumulate here over the semester. The lecture slides and student presentations are now available.

Presentation Schedule (subject to change)

Date Topic Presenters
Oct 18 Science databases Yulia Kotseruba
Oct 23 SQL vs. non-SQL databases

Mohammed-Ali Khan

Boze Zekan

Oct 25 RDF databases Nick Yakovets (guest lecture)
Oct 30 No class
Nov 6
Nov 8 Extending database usability

Protection of outsourced data

Michelle Brown

Maria Angel Marquez Andrade

Nov 13

Databases for Robotics Applications

Thomas Young

Li Ying

Nov 15 Big Data Kayvan Tirdad

Hang Gao

Nov 20 Mobile Databases Niloofar Banivaheb
Nov 22 Class evaluation

Nov 27
Nov 29

The Course


Required Textbook / Reading

Database Management Systems.
Third Edition, 2003.
Raghu Ramakrishnan and Johannes Gehrke
WCB/McGraw Hill.
ISBN: 0-07-232206-3
URL: http://www.cs.wisc.edu/~dbbook

Student presentations and additional lecture slides can be found here.


Course Objectives and Content

In this course, we go "under the hood" to learn how a relational database management system is built. Students will learn the issues involved in designing efficient database systems, and the strategies, data-structures, and algorithms used in the implementation of such systems.

The course is designed in three parts: the physical database, query processing and query optimization. Specific contents include the following.

I. The Physical Database
  • file organizations
  • indexes
    • tree-structured indexing
    • hash-based indexes
  • external sorting
II. Query Processing
  • evaluation of relational operators
    • selection
    • projection
    • joins (the many ways)
    • set operations
    • aggregate operations
  • physical database design and tuning
III. Query Optimization
  • rewrite optimization
    • semantic query optimization
    • magic sets
    • the COUNT bug
    • ...
  • cost optimization
    • cost model
    • selectivity estimation
    • ...
  • new paradigms in query optimization


Grading Criteria / Course Requirements

Percentage When
Report or Project 30% due on the last day of classes
Presentation 30% second half of the semester
Take-Home Final Exam 40% sometime in December

York University's rules for academic honesty and plagiarism always remain in effect. Discussion is fine on the projects. However, collaboration is not. The work must be your own. Exams, of course, must be done on your own.


Project

Students will design and implement some of the key algorithms and data-structures for a relational database system. Your code will be an extension of Derby, an open source database system. The project will be done individually or in teams (depending on the complexity of the project).

Late projects will not be accepted, unless prior approval has been obtained with good reason.

Note, the project will require a substantial amount of work and you will need to start it soon to be able to complete it on time.

In addition to coding, the project will require a report describing what has been achieved and a short demostration of the system with and without the added functionality.


Presentations

Every student will be required to choose a research topic in databases to explore and do a presentation on it. The student will read a few seminal papers on the topic, to be chosen by the instructor and student. (The papers on this list are a good starting point). Then, he or she will prepare and do an oral presentation to present the topic to the class. These presentations will be scheduled in the second half of the course.


Report (The "Proposal")

Each student will write a report. The report topic will be negotiated between the instructor and student. The report will be written as a proposal for research work to be done. (You will not actually do that research. Your work here is to identify a viable research problem in databases that should be addressed.) The task is to identify an unaddressed problem in databases which would be useful to address, to conduct a literature search with respect to the problem, and to propose a methodology by which the problem could potentially be addressed.

The report is to be written in conference-paper format. It should have:

  • an abstract,
  • an introduction which describes and motivates the problem at hand, and demonstrates the importance of the problem,
  • a related works section which demonstrates evidence that the problem is unresolved,
  • a methodologies section in which you propose how the problem might be successfully addressed,
  • conclusions, and
  • a proper bibliograpghy.
Milestones for the report will be:
  • 1. Topic meeting with instructor
  • 2. Initial Proposal (~1 page)
  • 3. Progress Review (meeting)
  • 4. Extended Abstract (~2-3 pages)
  • 5. Final Report ( 10 pages)