Reading List:
Choose one of the papers listed below and email me your selection.
Please note that you can NOT choose the papers in red color as they have already
been selected by others.
Data Warehousing
Feature Selection
Discretization of Continuous Attributes
-
Compression-Based Discretization of Continuous Attributes, B.
Pfahringer, Proceedings
of the 12th International Conference on Machine Learning,
Morgan Kaufmann Publishers, San Francisco, CA. 1995.
- Global Discretization of Continuous
Attributes as Preprocessing for Machine Learning, M.R. Chmielewski, J.W.
Grzymala-Busse, International Journal of Approximate Reasoning, 15 (1996), 319-331.
Classification
- RainForst: A framework for fast decision tree
construction of large datasets, In VLDB'98, pp. 416-427, New York, NY, 1998. (chosen by Zhihua Wen)
- Learning Trees and Rules with Set-valued
Features, William W. Cohen, Proceedings of the Thirteenth National
Conference on Artificial Intelligence (AAAI-96), 1996.
- Boosting,
Bagging, and C4.5, J. R. Quinlan, AAAI'96, pp 725-730
- Searching
Classification Knowledge in Databases Based on Rough Sets.
N. Shan, W. Ziarko, H. Hamilton, and N. Cercone, KDD-96,
pp. 271-274, Portland, OR, August 1996
Association Rules
-
Mining the Most Interesting Rules, Roberto J. Bayardo Jr.,
Rakesh Agrawal,
In Proc. of the 5th ACM SIGKDD Int'l Conf. on
Knowledge Discovery and Data Mining, August 1999.
- H-Mine: Hyper-structure Mining of Frequent
Patterns in Large Databases, J. Pei, J. Han, H. Lu, S. Tang, and
D. YangProc.
The 2001 IEEE International Conference on Data Mining (ICDM'01),
San Jose, California, Novermber 29-December 2, 2001.(Chosen by Junjie Guo)
-
Constraint-Based Rule Mining in Large,
Dense Databases, Roberto
J. Bayardo Jr., Rakesh Agrawal and Dimitrios Gunopulos,
Proceedings of the 15th International Conference on Data
Engineering, 1999 Mar, Sydney, Austrialia, IEEE CS Press,
1999
- Beyond
Market Baskets: Generalizing Association Rules to
Correlations, Craig Silverstein, Sergey Brin, Rajeev
Motwani, Data Mining and Knowledge Discovery, 2, 1998,
pp. 39-68
- Scalable
Techniques for Mining Causal Structures, Craig
Silverstein, Rajeev Motwani, Sergey Brin, and Jeff D.
Ullman, Proceedings of the 24th International Conference
on Very Large Data Bases (VLDB), 1998
Clustering
- CACTUS-Clustering
Categorical Data Using Summaries, Venkatesh Ganti,
Johannes Gehrke, Raghu Ramakrishnan, Proc. 5th ACM SIGKDD
International Conference on Knowledge Discovery and Data
Mining (KDD-99), 1999 Aug, pp. 73-83
- Clustering
Large Datasets in Arbitrary Metric Spaces, Venkatesh
Ganti, Raghu Ramakrishnan, Johannes Gehrke, Allison L.
Powell, James C. French, Proceedings of the 5th
International Conference on Data Engineering, 23-26 March
1999, Sydney, Austrialia, IEEE CS Press, 1999, pp.
502-511
- ROCK:
A Robust Clustering Algorithm for Categorical Attributes,
Sudipto Guha, Rajeev Rastogi, Kyuseok Shim, Proceedings
of the 15th International Conference on Data Engineering,
23-26 March 1999, Sydney, Austrialia, IEEE CS Press,
1999, pp. 512-521
- BIRCH:
an efficient data clustering method for very large
databases, Tian Zhang, Raghu Ramakrishnan, Miron
Livny, Proceedings of the 1996 ACM SIGMOD international
conference on Management of data , 1996, pp. 103-114
- CURE:
An Efficient Clustering Algorithm for Large Databases,
Sudipto Guha, Rajeev Rastogi, Kyuseok Shim, Proceedings
of the ACM SIGMOD Conference, 1998
- A
Density-Based Algorithm for Discovering Clusters in Large
Spatial Databases with Noise, M. Ester M., H.-P.
Kriegel, J. Sander, X. Xu, Proc. 2nd Int. Conf. on
Knowledge Discovery and Data Mining (KDD-96), Portland,
OR, 1996, pp. 226-231
- Efficient
and Effective Clustering Methods for Spatial Data Mining,
Raymond T. Ng, Jiawei Han, Intelligent Database Systems
Research Laboratory, Proc. of 1994 Int'l Conf. on Very
Large Data Bases (VLDB'94), Santiago, Chile, September
1994, pp. 144-155
Web Mining
- Learning to Extract Symbolic Knowledge
from the World Wide Web, M. Craven, D. DiPasquo, D. Freitag, A. McCallum,
T. Mitchell, K. Nigam and S. Slattery, Proceedings of the 15th National
Conference on Artificial Intelligence (AAAI-98), pp. 509-516,
Madison, WI. AAAI Press. (Chosen by Linyan Wang)
- Mining the Link Structure of the World Wide
Web, Soumen Chakrabarti, Byron E. Dom, S. Ravi Kumar,
Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, David Gibson, and
Jon Kleinberg,
IEEE Computer, vol. 32, no. 8, August 1999. (Chosen by Jane Tao)
- Data mining of user navigation patterns,
J. Borges and M. Levene, In Web Usage Analysis and User
Profiling, pp. 92-111. Published by Springer-Verlag as Lecture Notes in Computer Science, Vol. 1836, 2000. (Chosen by Qian Wan)
- A
Framework for Collaborative, Content-Based and Demographic
Filtering, Michael J. Pazzani, Artificial
Intelligence Review. (Chosen by Mariya Koshkina)
Spatial Mining
Sampling
- Is
Sampling Useful in Data Mining? A Case in the Maintenance
of Discovered Association Rules, S.D. Lee, David W.
Cheung, Ben Kao, Data Mining and Knowledge Discovery, An
International Journal, Vol. 2, pp. 233-262, Kluwer
Academic Publishers, 1998
- Sampling
Large Databases for Association Rules, Hannu
Toivonen, In 22th International Conference on Very Large
Databases (VLDB'96), 134-145, Mumbay, India, September
1996. Morgan Kaufmann (Chosen by Bohdan Krushelnytskyy)
Visualization
Miscellaneous
- A database perspective on knowledge
discovery, Tomasz
Imielinski and Heikki Mannila, Communications of the ACM,
Vol. 39, No. 11 (Nov. 1996) Pages 58 - 64 (Chosen by Ying Zou)
-
Real-world Data is Dirty: Data Cleansing and The Merge/Purge
Problem, Mauricio A. Hernández, Salvatore J.
Stolfo, Data Mining and Knowledge Discovery, Vol. 2, No.
1, 1998, pp. 9-37 (Chosen by Chenchen Xiao)
-
Discovering Robust Knowledge from Databases that Change,
Chun-Nan Hsu and Craig A. Knoblock, Data Mining and
Knowledge Discovery, 2(1), 1998, pp. 69-95