Title: Mining Billion-node Graphs: Patterns, Generators and Tools

Abstract:

What do graphs look like? How do they evolve over time?

How to handle a graph with a billion nodes?

We present a comprehensive list of static and temporal laws, and some recent observations on real graphs (like, e.g., ''eigenSpokes''). For generators, we describe some recent ones, which naturally match all of the known properties of real graphs. Finally, for tools, we present ''oddBall'' for discovering anomalies and patterns, as well as an overview of the PEGASUS system which is designed for handling Billion-node graphs, running on top of the ''hadoop'' system.

Biography:

Christos Faloutsos is a Professor at Carnegie Mellon University. He has received the Presidential Young Investigator Award by the National Science Foundation (1989), the Research Contributions Award in ICDM 2006, fifteen ''best paper'' awards, and four teaching awards. He has served as a member of the executive committee of SIGKDD; he has published over 200 refereed articles, 11 book chapters and one monograph. He holds five patents and he has given over 30 tutorials and over 10 invited distinguished lectures. His research interests include data mining for graphs and streams, fractals, database performance, and indexing for multimedia and bio-informatics data.

Title: Assessing the Significance of Groups in High-Dimensional Data

Abstract:

We consider the problem of assessing the significance of groups in high-dimensional data. In the case of supervised classification where there are data of known origin with respect to the groups under consideration, a guide to the degree of separation among the groups can be given in terms of the estimated error rate of a classifier formed to allocate a new observation to one of the groups. Even in this case with labelled training data, care has to be taken with the estimation of the error rate at least for high-dimensional data to avoid an overly optimistic assessment due to selection biases. In the case of unlabelled data, the problem of assessing whether groups identified from some data mining or cluster analytic procedure are genuine can be quite challenging, in particular for a large number of variables. We shall focus on the use of a resampling approach to this problem applied in conjunction with factor analytic models for the generation of the bootstrap samples under the null hypothesis for the number of groups. The proposed methods are to be demonstrated in their application to some high-dimensional data sets from the bioinformatics literature.

Biography:

Geoff McLachlan is Professor of Statistics in the Department of Mathematics and a Professorial Research Fellow in the Institute for Molecular Bioscience at the University of Queensland. He is also a chief investigator in the ARC Centre of Excellence in Bioinformatics. He currently holds an ARC Professorial Fellowship and is currently a member of the ARC College of Experts. He has written numerous research articles and six monographs, the last five in the Wiley series in Probability and Statistics. His current research interests are focussed in the fields of machine learning and bioinformatics.

Title: 10 Years of Data Mining Research: Retrospect and Prospect

Abstract:
This talk reviews 10 research activities with ICDM in the past 10 years,
discusses 10 achievements in the data mining community, and presents 10
research topics for the next 10 years.

Biography:

Xindong Wu is a Professor in the Department of Computer Science at the University of Vermont (USA) and a Yangtze River Scholar in the School of Computer Science and Information Engineering at the Hefei University of Technology (China). He holds a PhD in Artificial Intelligence from the University of Edinburgh, Britain. His research interests include data mining, knowledge-based systems, and Web information exploration. He has published over 200 refereed papers in these areas in various journals and conferences, including IEEE TKDE, TPAMI, ACM TOIS, DMKD, KAIS, IJCAI, AAAI, ICML, KDD, ICDM, and WWW, as well as 25 books and conference proceedings.

Dr. Wu is the founder and current Steering Committee Chair of the IEEE International Conference on Data Mining (ICDM), the founder and current Editor-in-Chief of Knowledge and Information Systems (KAIS, by Springer), the Founding Chair (2002-2006) of the IEEE Computer Society Technical Committee on Intelligent Informatics (TCII), and a Series Editor of the Springer Book Series on Advanced Information and Knowledge Processing (AI&KP). He was the Editor-in-Chief of the IEEE Transactions on Knowledge and Data Engineering (TKDE, by the IEEE Computer Society) between January 1, 2005 and December 31, 2008. He served as Program Committee Chair for ICDM '03 (the 2003 IEEE International Conference on Data Mining) and as Program Committee Co-Chair for KDD-07 (the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining), and is currently serving as Program Committee Chair (Knowledge Management Track) for CIKM 2010 (the 19th ACM Conference on Information and Knowledge Management).


Professor Wu is the 2004 ACM SIGKDD Service Award winner and the 2006 IEEE ICDM Outstanding Service Award winner. He has been an invited/keynote speaker at numerous international conferences including IEEE IRI 2010, IEEE GrC 2009, IDEAL 2009, JCKBSE 2008, HAIS 2008, NSF-NGDM'07, PAKDD-07, IEEE EDOC'06, IEEE ICTAI'04, IEEE/WIC/ACM WI'04/IAT'04, SEKE 2002, and PADD-97.

Title: Mining Billion-node Graphs: Patterns, Generators and Tools

Abstract:

What do graphs look like? How do they evolve over time?

How to handle a graph with a billion nodes?

We present a comprehensive list of static and temporal laws, and some recent observations on real graphs (like, e.g., ``eigenSpokes''). For generators, we describe some recent ones, which naturally match all of the known properties of real graphs.


Finally, for tools, we present ``oddBall''


for discovering anomalies and patterns,


as well as an overview of the PEGASUS system


which is designed for handling Billion-node graphs,


running on top of the ``hadoop'' system.