Reading List

Reading List:

Choose one of the papers listed below and email me your selection. Please note that you can NOT choose the papers in red color as they have already been selected by others.

Association Rule and Frequent Pattern Mining

Beyond Market Baskets: Generalizing Association Rules to Correlations, Craig Silverstein, Sergey Brin, Rajeev Motwani, Data Mining and Knowledge Discovery, 2, 1998, pp. 39-68
J. Pei and J. Han,
Scalable Techniques for Mining Causal Structures, Craig Silverstein, Rajeev Motwani, Sergey Brin, and Jeff D. Ullman, Proceedings of the 24th International Conference on Very Large Data Bases (VLDB), 1998
Brian Lent, Arun Swami and Jennifer Widom, Clustering Association Rules, Proceedings of ICDE'97, Birmingham, English 1997. Information Systems (JIIS), Kluwer Academic Publishers, Vol.27, No.2, 2006.
Xindong Wu, Chengqi Zhang and Shichao Zhang, Efficient Mining of Both Positive and Negative Association Rules. ACM Transactions on Information Systems, 22(2004), 3: 381-405. (SCI).
Guozhu Dong and Jinyan Li Efficient Mining of Emerging Patterns: Discovering Trends and Differences, KDD 1999: 43-52.
Jiong Yang, Wei Wang, Philip S. Yu: Infominer: mining surprising periodic patterns. KDD 2001: 395-400
Wan, Q. and An., A. Discovering Transitional Patterns and Their Significant Milestones in Transaction Databases, IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol.21, No.12, 2009.

Frequent Sequence Mining

M.J. Zaki. SPADE: An Efficient Algorithm for Mining Frequent Sequences, Machine Learning, Vol.42, No.1/2, 2001.
CloSpan: Mining Closed Sequential Patterns in Large Databases, Xifeng Yan, Jiawei Han, Ramin Afshar, Proceedings of the Third SIAM International Conference on Data Mining, San Francisco, CA, USA, May, 2003.

Data Stream Mining

Frequent item(set) Mining

Approximate Frequency Counts over Data Streams, by Gurmeet Singh Manku, Rajeev Motawani, in the International Conference on Very Large Data Bases (VLDB) 2002.
M. Charikar, K. Chen and M. Farach-Colton. Finding Frequent Items in Data Streams. International Colloquium on Automata,Languages, and Programming (ICALP '02) 508--515.
Finding Recent Frequent Itemsets Adaptively over Online Data Streams, by Joong Hyuk Chang, Won Suk Lee, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2003.

Classification

On Demand Classification of Data Streams, Aggarwal, Han, Wang, and Yu, KDD'04.
Mining Time-Changing Data Streams, by Geoff Hulten, Laurie Spencer, Pedro Domingos, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2001.
G. Widmer and M. Kubat. Learning in the Presence of Concept Drift and Hidden Contexts, Machine Learning, 23(1):69-101, 1996.
F. Ferrer-Troyano, J. Aguilar-Ruiz and J. Riquelme, Incremental Rule Learning and Border Examples Selection from Numerical Data Streams, J. of Universal Computer Science,11(8), 2005.
M. Maloof and R. Michalski, Incremental learning with partial instance memory, Artificial Intelligence, Vol.154, Issue 1-2, April 2004.
H. Wang, W. Fan, P. Yu and J. Han. Mining Concept-drifting Data Streams Using Ensemble Classifiers, Proceedings of ACM SIGKDD Conference, 2003.
D. Sotoudeh and A. An. Partial Drift Detection Using a Rule Induction Framework, Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, Canada, October 26-30, 2010.

Clustering

Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu. A Framework for Clustering Evolving Data Streams Proceedings of the International Conference on Very Large Data Bases (VLDB) 2003.
Konstantinos Kalpakis, Dhiral Gada, and Vasundhara Puttagunta, Distance Measures for Effective Clustering of ARIMA Time Series, ICDM'01.

Concept Drift Detection

Tamraparni Dasu, Shankar Krishnan, Suresh Venkatasubramanian, and Ke Yi, An Information-Theoretic Approach to Detecting Changes in Multi-Dimensional Data Streams, Proceedings of the 38th Symposium on the Interface of Statistics, Computing Science, and Applications, pages 1-24, 2006.
P. Vorburg and A. Bernstein. Entropy-based concept shift detection. Proceedings of the Sixth International Conference on Data Mining, pages 1113-1118, 2006.

Social Networks and Graph Mining

Mehdi Kargar and Aijun An Discovering Top-k Teams of Experts with/without a Leader in Social Networks, Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM'11), Glasgow, U.K., October 24-28, 2011. 985-994.
Manuel Gomez-Rodriguez, Jure Leskovec and Andreas Krause, Inferring Networks of Diffusion and Influence, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2010.
Jie Tang, Jimeng Sun, Chi Wang and Zi Yang, Social In?uence Analysis in Large-scale Networks, Proceedings of the Fifteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'09), 2009.
D. Crandall, et al., Feedback Effects between Similarity and Social In?uence in Online Communities, Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'08), 2008.
Theodoros Lappas, Kun Liu and Evimaria Terzi. Finding a Team of Experts in Social Networks. SIGKDD 2009
Hanghang Tong, Christos Faloutsos. Center-piece subgraphs: problem definition and fast solutions. SIGKDD 2006
Yehuda Koren, Stephen C. North, Chris Volinsky. Measuring and extracting proximity graphs in networks. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(3), 2007.
Efficiently mining frequent trees in a forest, Mohammed J. Zaki, KDD 2002.
Frequent Subgraph Discovery, Michihiro Kuramochi and George Karypis, ICDM, 2001.
Substructure Similarity Search in Graph Databases. Xifeng Yan, Philip Yu, Jiawei Han, SIGMOD'05.
Frequent Subtree Mining - An Overview, Yun Chi, Siegfried Nijssen, Richard Muntz, Joost Kok, Fundamenta Informaticae Special Issue on Graph and Tree Mining, 2005.

Topic Detection and Tracking

D. M. Blei, A. Y. Ng, and M. I. Jordan, Latent dirichlet allocation, J. Mach. Learn. Res., vol. 3, pp. 993–1022, 2003.
D. Blei, J. McAuliffe. . Neural Information Processing Systems 21, 2007
X. Wang and A. McCallum, Topics over time: a non-markov continuous-time model of topical trends, in Proceedings of the 12th ACM SIGKDD, 2006, pp. 424–433.
C. Wang, D. Blei, and D. Heckerman. Continuous time dynamic topic models. In Uncertainty in Artificial Intelligence [UAI], 2008.
L. AlSumait, D. Barbara, and C. Domeniconi, On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking, in Proceedings of the 8th IEEE ICDM, 2008, pp. 3–12. (chosen by Morteza Zihayat)

Opinion Mining

C. Tao et al., User-Level Sentiment Analysis Incorporating Social Networks, KDD'11, 2011
Murthy Ganapathibhotla and Bing Liu. Mining Opinions in Comparative Sentences. Proceedings of the 22nd International Conference on Computational Linguistics (Coling-2008), Manchester, 18-22 August, 2008. (chosen by Nadine Dulisch)
Xiaowen Ding, Bing Liu and Philip S. Yu. A Holistic Lexicon-Based Appraoch to Opinion Mining. Proceedings of First ACM International Conference on Web Search and Data Mining (WSDM-2008), Feb 11-12, 2008, Stanford University, Stanford, California, USA. (chosen by Stephen Voland)

Decision Tree Learning

Johannes Gehrke , Raghu Ramakrishnan , Venkatesh Ganti. RainForest: A framework for fast decision tree construction of large datasets, In VLDB'98, pp. 416-427, New York, NY, 1998.
Learning Trees and Rules with Set-valued Features, William W. Cohen, Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), 1996.
Cesar Ferri, Peter Flach and Jose Hernandez-Orallo, Learning Decision Trees Using the Area Under the ROC Curve, Proceedings of the 19th International Conference on Machine Learning, Morgan Kaufmann, July 2002, pp.139-146.

Decision Rule Learning

Quinlan, J. R. and Cameron-Jones, R. M. FOIL: A Midterm Report. Proc. of ECML, Vienna, Austria, 1993. pp3-20.
Linyan Wang and Aijun An, Fast counting with AV-Space for Efficient Rule Induction, Proceedings of the SIAM International Conference on Data Mining (SDM'07), Minneapolis, Minnesota, April 26-28, 2007.

Support Vector Machines

B. Boser, I. Guyon and V.N. Vapnik. A Training Algorithm for Optimal Margin Classifiers, Proc. of Fifth Annual Workshop on Computational Learning Theory, pp.114-152, 1992.

Learning from Imbalanced Datasets

PNrule: A New Framework for Learning Classifier Models in Data Mining (A Case-Study in Network Intrusion Detection), Ramesh Agarwal and Mahesh V. Joshi, 2001.
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P, SMOTE: Synthetic Minority Over-sampling TEchnique, Journal of Artificial Intelligence Research, 16, 2002, 341-378.

Active Learning

P. Melville, S.M. Yang, M. Saar-Tsechansky, and R. Mooney. Active learning for probability estimation using Jensen-Shannon divergence. In Proceedings of the European Conference on Machine Learning (ECML), pages 268–279. Springer, 2005.
C. K¨orner and S. Wrobel. Multi-class ensemble-based active learning. In Proceedings of the European Conference on Machine Learning (ECML), pages 687–694. Springer, 2006.

Clustering

CACTUS-Clustering Categorical Data Using Summaries, Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan, Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999 Aug, pp. 73-83.
Clustering Large Datasets in Arbitrary Metric Spaces, Venkatesh Ganti, Raghu Ramakrishnan, Johannes Gehrke, Allison L. Powell, James C. French, Proceedings of the 5th International Conference on Data Engineering, 23-26 March 1999, Sydney, Austrialia, IEEE CS Press, 1999, pp. 502-511
ROCK: A Robust Clustering Algorithm for Categorical Attributes, Sudipto Guha, Rajeev Rastogi, Kyuseok Shim, Proceedings of the 15th International Conference on Data Engineering, 23-26 March 1999, Sydney, Austrialia, IEEE CS Press, 1999, pp. 512-521.
BIRCH: an efficient data clustering method for very large databases, Tian Zhang, Raghu Ramakrishnan, Miron Livny, Proceedings of the 1996 ACM SIGMOD international conference on Management of data , 1996, pp. 103-114.
CURE: An Efficient Clustering Algorithm for Large Databases, Sudipto Guha, Rajeev Rastogi, Kyuseok Shim, Proceedings of the ACM SIGMOD Conference, 1998.

Web Mining

Sundaresan, Neel and Yi, Jeonghee Yi (2000).Mining the Web for Relations, Proceedings of the 9th International World Wide Web Conference on Computer Networks: the International Journal of Computer and Telecommunications Networking. Amsterdam, The Netherlands, pages: 699-711 Online. Accessed January 21, 2006. (chosen by Hongbin Lu)
Larry Page, Sergey Brin, R. Motwani, T. Winograd, The PageRank Citation Ranking: Bringing Order to the Web, Technical Report, Computer Science Department, Stanford University, 1998. (chosen by Hao Zhong)
J. Kleinberg, Authoritative sources in a hyperlinked environment, In Proc. Ninth Ann. ACM-SIAM Symp. Discrete Algorithms, pages 668-677, ACM Press, New York, 1998.
Data mining of user navigation patterns, J. Borges and M. Levene, In Web Usage Analysis and User Profiling, pp. 92-111. Published by Springer-Verlag as Lecture Notes in Computer Science, Vol. 1836, 2000. (chosen by Sanjay Kaushik)

Privacy Preserving Data Mining

Privacy Preserving Mining of Association Rules, by A. Evfimievski, R. Srikant, R. Agrawal and J. Gehrke, KDD 2002.
Using Randomized Response Techniques for Privacy-Preserving Data Mining, by Wenliang Du and Zhijun Zhan, SIGKDD 2003.
Privacy-Preserving K-Means Clustering over Vertically Partitioned Data, by Jaideep Vaidya and Chris Clifton, SIGKDD 2003.
Collaborative Filtering with Privacy, by John Canny, IEEE S&P 2002.