COSC 4412 3.0 (Summer School 2011)

CSE4412 3.0 Data Mining (Held at the BRSU in Germany, Summer 2011)

Course Outline

Course Director: Prof. Martin Müller

This is a course on "knowledge discovery". In contrast to data mining where we try to identify patterns in sets of data, knowledge discovery tries to "understand" the meaning of data and then explain patterns.
The second difference is that we focus on "knowledge" rather than on data. Knowledge consists of propositions about properties of data whereas data mining usually involves data preprocessing according
to a predefined model.

Nevertheless, most methods presented in this course are pretty well-known (may it be in data mining or machine learning). We will learn about:

- knowledge and knowledge representation
- sets, concepts, hypotheses, targets, and errors
- several techniques:
- clustering (k-Means)
- decision tree induction
- rough set data analysis

- Artificial Neural Networks (MLPs and SOMs)
- Genetic Algorithms
- Hidden markov Models

- inductive logic programming
- ensemble learning (Bagging and Boosting)

If there is enough time left, we will conclude with a brief outlook to learning theory. It can well be that we will cover only parts of the topics
depending on the prior knowledge of attending students.

The course also includes lab sessions. Organisation is not clear yet; there are, in general, two options:

The first one is that we try to get a feel for the methods using existing software packages like WEKA or RapidMiner.
The other one is a rather "free" software project where small groups of students will implement several packages which all together
yield a functionality similar to (parts of) WEKA or RapidMiner.

The second option requires more preliminary knowledge; so we shall decide which option to take on the spot.

Please: If you have a laptop, please bring it with you. If you don't please tell Nadine Fröbel asap so we can organise the required
number of work stations.

The grading will be on basis of a written exam at the end of the course (not on the project - unless too many of you prefer sightseeing
rather than lab sessions).

Books:

I can't recommend "a" or "the" book. The topics we deal with are the subject of many books.

For an overview I still recommend Tom Mitchell's "Machine Learning".
Similar, but more recent, is Alpaydin's "introduction to Machine Learning".

Also interesting are:

Lavrac/Dzeroski "Inductive Logic Programming"
de Raedt, "Logic and Relational Learning"
Kearns, "The Computatinoal Complexity of Machine Learning"
Anthony/Biggs, "Computational Learning Theory"
Pawlak, "Rough Sets"
Kohonen, "Self Organising Maps"
Ripley, "Pattern Recognition and Neural Networks"