| Keynote Speakers |
|
|
Title: Mining Billion-node Graphs: Patterns, Generators and Tools Abstract: What do graphs look like? How do they evolve over time? How to handle a graph with a billion nodes? We present a comprehensive list of static and temporal laws, and some recent observations on real graphs (like, e.g., ''eigenSpokes''). For generators, we describe some recent ones, which naturally match all of the known properties of real graphs. Finally, for tools, we present ''oddBall'' for discovering anomalies and patterns, as well as an overview of the PEGASUS system which is designed for handling Billion-node graphs, running on top of the ''hadoop'' system. Biography: Christos Faloutsos is a Professor at Carnegie Mellon University. He has received the Presidential Young Investigator Award by the National Science Foundation (1989), the Research Contributions Award in ICDM 2006, fifteen ''best paper'' awards, and four teaching awards. He has served as a member of the executive committee of SIGKDD; he has published over 200 refereed articles, 11 book chapters and one monograph. He holds five patents and he has given over 30 tutorials and over 10 invited distinguished lectures. His research interests include data mining for graphs and streams, fractals, database performance, and indexing for multimedia and bio-informatics data.
Title: Assessing the Significance of Groups in High-Dimensional Data Abstract: We consider the problem of assessing the significance of groups in high-dimensional data. In the case of supervised classification where there are data of known origin with respect to the groups under consideration, a guide to the degree of separation among the groups can be given in terms of the estimated error rate of a classifier formed to allocate a new observation to one of the groups. Even in this case with labelled training data, care has to be taken with the estimation of the error rate at least for high-dimensional data to avoid an overly optimistic assessment due to selection biases. In the case of unlabelled data, the problem of assessing whether groups identified from some data mining or cluster analytic procedure are genuine can be quite challenging, in particular for a large number of variables. We shall focus on the use of a resampling approach to this problem applied in conjunction with factor analytic models for the generation of the bootstrap samples under the null hypothesis for the number of groups. The proposed methods are to be demonstrated in their application to some high-dimensional data sets from the bioinformatics literature. Biography: Geoff McLachlan is Professor of Statistics in the Department of Mathematics and a Professorial Research Fellow in the Institute for Molecular Bioscience at the University of Queensland. He is also a chief investigator in the ARC Centre of Excellence in Bioinformatics. He currently holds an ARC Professorial Fellowship and is currently a member of the ARC College of Experts. He has written numerous research articles and six monographs, the last five in the Wiley series in Probability and Statistics. His current research interests are focussed in the fields of machine learning and bioinformatics.
Title: 10 Years of Data Mining Research: Retrospect and Prospect Abstract: Biography: Xindong Wu is a Professor in the Department of Computer Science at the University of Vermont (USA) and a Yangtze River Scholar in the School of Computer Science and Information Engineering at the Hefei University of Technology (China). He holds a PhD in Artificial Intelligence from the University of Edinburgh, Britain. His research interests include data mining, knowledge-based systems, and Web information exploration. He has published over 200 refereed papers in these areas in various journals and conferences, including IEEE TKDE, TPAMI, ACM TOIS, DMKD, KAIS, IJCAI, AAAI, ICML, KDD, ICDM, and WWW, as well as 25 books and conference proceedings.
|



