|
CSE-6490B
Issues in Information Integration
York University
Fall 2008
|
Syllabus
|
|
|
Instructor:
|
Parke Godfrey
|
Office:
|
#2050 CSE
|
Office Hours:
|
Mon 1-3pm
|
& by appointment / availability
|
Ph#:
|
416-736-2100 x66671
|
e-mail:
|
godfrey@cse.yorku.ca
|
|
|
Term:
|
Fall 2008
|
Time:
|
We 2:30-5:30pm
|
Place:
|
CB #129
|
|
|
|
|
The Topic
Database
technology has been quite successful for organizing and storing
very large amounts of information,
and providing efficient access to this information
via powerful query languages.
A database management system also helps to protect
the integrity of its data.
Unfortunately,
the information we might need often is not located at a single source,
in a single database,
or we do not know in which source the information resides.
The answer to a query may need to compose information from various
sources.
This is called information integration.
Information integration technology and research
is becoming increasingly vital.
Companies maintain many databases and information sources.
They have operating tasks that require various degrees
of integration.
Governments need integration for everything from
sercurity to offering better services.
The world wide web can be viewed as a vast collection
of information sources which we would like to extract information
collectively.
|
|
Course Objectives and Content
In
this course,
we shall study topics, research, and developments
in information integration.
It will be conducted much as a readings course.
Each class,
we shall cover and discuss one key paper and its topic,
and additional related papers (recommended reading).
We start from a formal perspective,
for two reasons:
a formal foundation helps to elucidate the problems
and potential solutions;
and this is needed to understand
much of the research in information integration.
Tentative topics to be covered are as follows.
- Introduction
- What is the problem?
- The center cannot hold.
- Information is distributed over different sources.
It is not all in one place under one schema.
- What are solutions?
- Background
- Datalog: A Primer
- the data model
- evaluation strategies (top-down and bottom-up)
- negation, open and closed world assumptions
- answer set semantics
- SQL-3
- Mediation
- Query Answering over distributed data sources [semantics]
- query rewrite
- query containment and query folding
- semantic query caching
- Schema and Data Integration and Mapping [frameworks]
- mediators and wrappers
- global-as-view vs. local-as-view
- schema mapping and integration
- source-to-target dependencies
- schematic discrepancies
- peer to peer approaches
- conflict resolution, repairs, and consistent query answers
- management of uncertain and disjunctive information
- cooperative answering systems
- Query Optimization & Caching for Mediators [implementation]
- semantic query optimization
- dynamic query processing
- Data Models and Semi-structured Data
- XML
- XML data model
- XPath and XQuery
- XML integrity constraints
- XML query processing
- description logics behind XML
- storing semi-structured data in relational DBMSs
- search versus queries
- Ontologies and Semantic Web
- Application Domains
- bio-informatics
- emergency and rescue
- ...
This is tentative and no doubt will adjust during the course.
In reality,
we probably cannot cover all those topics.
Furthermore,
we shall need to adjust dynamically,
based on everyone's interests.
|
|
Readings
There
is no textbook.
There will be a primary paper
(journal or conference article or book chapter)
assigned for each class session.
These will be made available on the class webpage.
There will often be a second, recommended paper each time.
(The discussion leader and responder
will be responsible covering for these.)
The other material will be class notes (as lecture slides),
tutorial notes, etc., which will be covered in class
and made available on the class webpage.
|
|
|
|
Grading Criteria / Course Requirements
|
|
Components
|
Percentage
|
When
|
Assignments
|
3 × 10% = 30%
|
over the term
|
Classtime Activities
|
25%
|
over the term
|
discussion leader
|
10%
|
|
responder
|
10%
|
|
participation
|
5%
|
|
Report or Project
|
25%
|
due at end of classes
|
Final Exam
|
20%
|
takehome, due in exam period
|
The
grading policy is standard.
Discussion is fine,
but your assignment work must be your own.
Class attendance will not be monitored,
but is important
since the class will involve readings and discussion.
Each person will serve as a discussion leader one time
for a given topic.
He or she will need to read that day's paper (and "recommended" paper)
thoroughly.
The leader will present a brief summary of the papers
with the important issues, results, and open problems.
He or she will then lead the discussion.
Each person will also serve as a responder one time
for a given topic.
The responder also will need to read that day's paper
(and "recommended" paper) thoroughly.
He or she prepares questions for the leader.
Each student may choose to do either a term report
or a term project.
A term report will be like a technical report
and will summarize a specific topic.
(The ballpark is that it would be 12 to 15 pages.)
A term project would involve implementing something / building a prototype,
or applying existing information integration tools
to a specific problem domain.
Term projects can be done in teams (up to two people),
of course with the size of the project being commensurate.
A team project will receive a single grade.
Report / project topics are to be negotiated with the instructor.
There will be a final exam worth 20%
that will be takehome to be done during the exam cycle.
It will have exercises and questions pertinent to the readings and
course lectures and materials.
It will be open "book",
but, of course, without consultation with others.
York University's rules for academic honesty
and plagiarism are always in effect.
(
See below.)
Discussion is fine on the assignments and projects.
However, collaboration is not.
The work must be your own.
|
|
|
|
Week#
|
day
|
topic
|
#1
|
3 Sept
|
Introduction & Background
|
#2
|
10 Sept
|
Logic & Non-monotonicity
|
#3
|
17 Sept
|
rewrite, folding, & containment
|
#4
|
24 Sept
|
mediation, GAV, & LAV
|
|
1 Oct
|
no class (Rosh Hashanah)
|
#5
|
8 Oct
|
schema mapping & peer-to-peer
|
#6
|
15 Oct
|
query optimization for mediators
|
#7
|
22 Oct
|
query caching for mediators
|
#8
|
29 Oct
|
semi-structured data
|
#9
|
5 Nov
|
XML
|
#10
|
12 Nov
|
XPath & XQuery
|
#11
|
19 Nov
|
search versus query
|
#12
|
26 Nov
|
semantic web & ontologies / applications
|
|
|
|
|
Academic Integrity / Honesty / Plagiarism
The
Department of Computer Science (& Engineering)
Academic Honesty Guidelines
are in effect for this course,
as, indeed, they are for any CS&E course.
Plagiarism is defined as taking the language, ideas, or thoughts
of another,
and representing them as your own.
If you use someone else's ideas, cite them.
If you use someone else's words, clearly mark them as a quotation.
Note that plagiarism includes using another's computer programs or pieces
of a program.
All noted instances of plagiarism will be reported.
These policies are not intended to keep students
from working with other students.
One can learn much working with others,
so this is to be encouraged.
Should you encounter any situations for which you are uncertain
whether the collaboration is permitted or not,
please ask.
|
|