Decision Support Systems
COSC-6490C
Winter 2002
York University


Semester: Winter 2002
Course/Sect#: COSC-6490C
Time: Tue 11:30am-1:00pm
Fri 11:00am-12:30pm
Location: CCB 120
Instructor: Aijun An
Office: CSB 2048
Office Hours: Fri 1:30-2:30pm
& by appointment
Ph#: 416-736-2100 x44298
e-mail: aan@cs.yorku.ca


Welcome to the Decision Support Systems course, COSC-6490C, for Winter 2002. Materials, instructions, and notices for the course will accumulate here over the semester.


Description

Data mining or knowledge discovery from databases (KDD) is one of the most active areas of research in databases. It is at the intersection of database systems, statistics, AI/machine learning, and data visualization. In this course, we will introduce data warehousing, OLAP technology and data mining methods. We will study their principles, algorithms, implementations, and applications.


Prerequisites

  • Required: an introductory course on database systems.
  • Preferred: preliminary preparation on artificial intelligence, machine learning and statistics.


Text and Reference Books

  • Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2000.
  • S.M. Weiss and N. Indurkhya, Predictive Data Mining, Morgan Kaufmann, 1998.
  • Some conference/journal papers (will be given in the class).


Lecture Notes


Assignments


Grading Scheme

  • Assignments (20%)
  • Midterm exam (20%)
  • Paper review and presentation (20%)
  • Course project (40%)


Paper Review and Presentation

Each student will select a research paper from the list below, write a short summary of the paper and prepare a presentation to be made to the class.
  • Paper presentation schedule
  • Reading list
  • Written summary (Due March 5). A student should write a short report (2-3 pages) that summarizes the objectives and results of the research paper read. The report should address the following questions:
    • What are the objectives of the paper?
    • What technique(s) does the paper propose?
    • What are the advantages of the technique?
    • What are the limitations of the technique?
    • What are the open (unresolved) issues in the research?
  • Class presentation. Each student should prepare to make a 20 minute presentation to the class. The objective is to introduce the class to the topic or technique the paper describes.


Projects

Projects can be done in groups of 1-2 students. A project will include the following components:
  • Project proposal (Due March 8). Students should write a project proposal (1 or 2 pages), which describes the project topic, approach and schedule.
  • Implementation. Each group will implement a data mining or a data preprocessing algorithm. The program should be tested on a number of data sets and can be of real-use. You are welcome to work on topics of your own interests with the consent of the instructor. Examples of topics are:
    • Discretization algorithms
    • Methods for handling missing values
    • Classification algorithms
    • Association rule mining algorithms
    • Clustering algorithms
    • Clustering or classification of web documents according to document attributes, usage and content
    • Association rule extraction from web access logs
    • Text mining from text documents
  • Written report. Your project report should include the following:
    • Description of the algorithm or method implemented.
    • Description of your system design (system components, data structures, control flow structures, etc.)
    • Outline of your test procedures and your test results in terms of performance measures.
    • Concise and clear user manual that describes how to use your system.
    • Sample output, limitations and known bugs.
    • Discussions:
      • Based on your understanding and experiments, discuss what are the advantages and disadvantages of the algorithm or approach that you have implemented.
      • Based on your experiments and observation, does your implementation method scale well on very large database (e.g., scale your data set 100,000 times)? why?
      • Discuss how to further improve your system.
  • Class presentation Project presentation schedule
  • System demo.


Schedules


Useful On-line Information

Data Sets in ELEM2 Formats