|
CSE-6421:
Advanced Database Systems
York University
Winter 2016
|
Syllabus
|
|
|
Instructor:
|
Parke Godfrey
|
Office:
|
#2050 Lassonde
|
Office Hours:
|
We 2–4pm
|
& by appointment / availability
|
Ph#:
|
416-736-2100 x66671
|
e-mail:
|
godfrey@cse.yorku.ca
|
|
|
|
Term:
|
winter 2015
|
Time:
|
Tu & Th 5:30pm–7:00pm
|
Place:
|
CC #335
|
|
|
|
|
|
|
Description (from the academic calendar)
This
course provides an introduction to, and an in-depth study on, several
new developments in database systems and intelligent information systems.
Topics include: internet databases, data warehousing and OLAP,
object-relational, object-oriented, and deductive databases.
Formal semantics of relational databases and systems, physical database
tuning, advanced issues in query optimization and transaction processing,
advanced database facilities such as triggers and materialized views,
query caching, and database mediation.
Current topics in database research and development.
|
|
Course Objectives & Content
The
course is arranged in three parts.
- Logic
- System
- Beyond Relational
- Directions
First, we start at the theoretical end of databases
to consider issues of semantics.
This is necessary,
if we are to address important questions as in the following.
- What does a query mean;
how do we know whether the answer table returned by the database system
is correct?
- How expressive is the query language?
What types of queries are we unable to ask?
Would recursion add anything?
Would negation?
- When are two queries equivalent?
- When are two database schemas equivalent?
- How computationally difficult are these questions to answer?
To address these,
we study formal, logical models of databases and queries.
Next, we move to the systems end of databases
and go “under the hood” of the system
(a relational database management system, RDBMS)
to understand how it works.
We study how a RDBMS is constructed.
We study the physical database,
how the data is stored, organized,
and efficiently accessed.
We next study how SQL queries are efficiently evaluated.
In particular, we look at the query optimizer
which turns the SQL query into an access plan
that the database system can execute.
We review other functionalities that a database system needs to support
such as transaction management.
We then look at alternative data models and query languages.
We study their differences,
their advantages and disadvantages,
and what they offer.
Last,
we move to a selection of topics
found in current database research and development.
This course is meant to prepare students with
a good operating knowledge of modern relational database systems,
a knowledge of current research and development issues in databases,
and
a foundation for conducting research in this area.
It is expected that students enter the course
with some background in databases,
namely an understanding of, and facility with, SQL and
schema design,
and have some hands-on experience with an RDBMS.
|
|
Readings
Reading
and study material will be assigned from the course textbook,
from relevant conference and journal papers,
and lecture notes.
|
|
|
|
Grading Criteria & Course Requirements
|
|
Components
What
|
Percentage
|
Assignments
|
5 × 5%
|
Report
|
25%
|
Presentation
|
10%
|
Summary Log
|
10%
|
Participation
|
5%
|
Final Exam
|
25%
|
|
|
Assignments
There
will be five assignments throughout the term,
approximately one due every two weeks.
These will be a mix
of homework assignments related to the topics we discuss
and small projects,
such as to write queries or analyze query plans.
|
|
Report
(“The Proposal”)
Each
student will write a report.
The report topic will be negotiated between the instructor and student.
The report will be written as a proposal
for research work to be done.
(You will not actually do that research.
Your work here is to identify a viable research problem
in databases that should be addressed.)
The task is to identify an unaddressed problem in databases
which would be useful to address,
to conduct a literature search with respect to the problem,
and to propose a methodology by which the problem could potentially be
addressed.
The report is to be written in conference-paper format
in the LNCS format.
It should have
- an abstract,
- an introduction which describes and motivates the problem at hand,
and demonstrates the importance of the problem,
- a related works section which demonstrates evidence
that the problem is unresolved,
- a methodologies section in which you propose how the problem might be
successfully addressed,
- conclusions, and
- a proper bibliograpghy.
Milestones for the report will be:
- Topic meeting with instructor
- Initial Proposal (~1 page)
- Progress Review (meeting)
- Extended Abstract (~2-3 pages)
- Final Report (≤ 12 pages)
|
|
Presentation
You
will make a presentation to the class on a given topic.
It will be based usually upon two or three research papers.
One of the papers will be assigned to the class as a whole to read
before the talk.
The talk will be for twenty minutes,
with ten more minutes for a question & answer period
and discussion.
The topic of a student's presentation
typically will be closely related to his / her report topic,
although this is not strictly necessary.
Talks will be graded in part by the class.
The topic a student chooses for his or her talk
can be related to his or her project,
but it does not have to be.
Talks will be scheduled for the last quarter of the semester
(Directions).
The milestones for talks are:
- Choosing a talk topic.
The student and instructor will negotiate the research papers
to be read.
- The presentation itself.
|
|
Summary Log & Participation
Readings
of research papers in addition to the text will be assigned,
particularly during the Semantics and Topics
parts of the course.
For each of the research papers assigned to the class as a whole,
the student should
-
write a reasonable one to two paragraph summary
of the paper,
and
-
compose one question about the work
that he or she feels is not well addressed in the paper.
These summaries can be turned in by e-mail to the instructor
progressively through the term
(up to the final due date).
Clearly, this component is simply a trick, er mechanism,
to ensure that students stay somewhat current with the reading.
Students may leave out up to three papers
from their summary log with no penalty.
Class participation will be graded for 5% towards the final grade.
|
|
Final Exam
There
will be a comprehensive final exam
that contributes 25% towards the final grade.
|
|
|
|
The
schedule may be revised as the term progresses.
Textbook readings are indicated.
Readings of research papers will also be assigned.
Note that this schedule is tentative and is subject to change.
Wk#
|
Day
|
Topic
|
Reading
|
Due
|
#1
|
Tu 5 Jan
|
Into the briar patch.
Introduction.
|
§1
|
|
I. Logic: What does it mean?
|
|
Th 7 Jan
|
It's only logical.
|
slides
|
|
#2
|
Tu 12 Jan
|
Logic Primer.
|
|
|
|
Th 14 Jan
|
I refute that!
|
§24
|
|
#3
|
Tu 19 Jan
|
Deductive databases, Datalog, & Prolog
|
|
|
|
Th 21 Jan
|
Negation by any other name...
|
A
|
A1
|
#4
|
Tu 26 Jan
|
Negation & non-monotonicity
|
B
|
|
|
Th 28 Jan
|
query containment, relational calculus
|
§4.3,
C
|
R1
|
#5
|
Tu 2 Feb
|
What was the question?
|
§5,
D
|
|
|
Th 4 Feb
|
SQL
|
E
|
R2
|
II. System: How does it work?
|
#6
|
Tu 9 Feb
|
the physical database & cost model
|
§8, 9
|
|
|
Th 11 Feb
|
indexes
|
§10, 11
|
A2
|
Reading Week
(13–19 February)
|
#7
|
Tu 23 Feb
|
Psst. Want a join algorithm, cheap?
|
§14
|
|
|
Th 25 Feb
|
Implementing the algebra.
|
F
|
P1
|
#8
|
Tu 1 Mar
|
I have a plan.
|
§15,
G
|
|
|
Th 3 Mar
|
the query optimizer(s)
|
H
|
A3
|
III. Beyond Relational: What more is there?
|
#9
|
Tu 8 Mar
|
XML & XQuery
|
§27,
I
|
|
|
Th 10 Mar
|
RDF & SPARQL
|
J
|
R3
|
#10
|
Tu 15 Mar
|
Scale up & Scale out (“Big Data”)
|
K
|
|
|
Th 17 Mar
|
PIG, et al.
|
L
|
A4
|
IV. Directions: Where to next?
|
#11
|
Tu 22 Mar
|
|
M,
N
|
P2
|
|
Th 24 Mar
|
|
O,
P
|
R4
|
#12
|
Tu 29 Mar
|
|
Q,
R
|
P2
|
|
Th 31 Mar
|
|
S,
T
|
A5
|
|
#13
|
? Apr
|
Final Exam
|
R5
|
|
§x:
|
Readings from the textbook.
|
Q:
|
Provided reading assigned.
|
Ax:
|
Assignment x due.
|
Rx:
|
Report milestone due.
|
Px:
|
Presentation milestone due.
|
|
|
|
|
Academic Integrity / Honesty / Plagiarism
The
Department of Electrical Engineering & Computer Science
Academic Honesty Guidelines
are in effect for this course,
as, indeed, they are for any CS&E / EECS course.
Plagiarism is defined as taking the language, ideas, or thoughts
of another,
and representing them as your own.
If you use someone else's ideas, cite them.
If you use someone else's words, clearly mark them as a quotation.
Note that plagiarism includes using another's computer programs or pieces
of a program.
All noted instances of plagiarism will be reported.
These policies are not intended to keep students
from working with other students.
One can learn much working with others,
so this is to be encouraged.
Should you encounter any situations for which you are uncertain
whether the collaboration is permitted or not,
please ask.
|
|
|