T3 : Data Mining Meets the Internet: Techniques for Web Information Retrieval and Network Data Management

Rajeev Rastogi, Bell Laboratories, Lucent Technologies
Minos Garofalakis, Bell Laboratories, Lucent Technologies

This tutorial comprises two parts, each discussing an important application domain for data mining techniques in the Internet. The first part focuses on Web information retrieval. Modern Web search tools are plagued by a number of problems, including result abundance, limited coverage, restricted query interfaces, and limited customization. We discuss how data mining techniques can be used to organize Web content in a more structured fashion and also to improve the quality of search. Topics covered will include the use of hubs and authorities to discover hyper-linked communities, automatic classification of Web documents, Web page clustering, similar image retrieval and the impact of XML on Web information retrieval.

The second part of the tutorial concentrates on applications of data mining techniques in the processing and analysis of network management data. Data mining ideas can provide an effective way of dealing with the abundance of management (e.g., traffic, billing) data collected during the operation of a large-scale network. We discuss the use of mining techniques for semantic compression and fast query processing over network management data. We also consider applications of data mining in traffic management and fault management, the two network management tasks that are essential for reliable and fast content delivery on the Internet.

Presenter Bios

Minos Garofalakis is a Member of Technical Staff at the Information Sciences Research Center of Bell Laboratories, Lucent Technologies. He received his B.Sc. in 1992 (Valedictorian, College of Engineering) from the Computer Engineering and Informatics Dept. of the University of Patras (UOPCEID). He also spent the following year at UOPCEID as a post-graduate fellow. In the Fall of 1993, he joined the graduate program in Computer Sciences at the University of Wisconsin-Madison, where he received his M.Sc. (1994) and Ph.D. (1998). He joined Bell Laboratories in Murray Hill, New Jersey, in September 1998.

Minos Garofalakis' current research interests lie in the areas of data reduction and mining, data warehousing, approximate query processing, network management, and Internet databases. He is a member of ACM and IEEE, and has served as a program committee member for ACM SIGMOD'2001 and other conferences/workshops in the database area.

Rajeev Rastogi is the Director of the Internet Management Research Department at Bell Laboratories, Lucent Technologies. He received the B. Tech degree in Computer Science from the Indian Institute of Technology, Bombay in 1988, and the masters and Ph.D. degrees in Computer Science from the University of Texas, Austin, in 1990 and 1993, respectively. He joined Bell Laboratories in Murray Hill, New Jersey, in 1993 and became a Distinguished Member of Technical Staff (DMTS) in 1998.

Rajeev Rastogi is active in the field of databases and has served as a program committee member for several conferences in the area. His writings have appeared in a number of ACM and IEEE publications and other professional conferences and journals. His research interests include database systems, storage systems, knowledge discovery and network management. His most recent research has focused on the areas of network management, data mining, high-performance transaction systems, and continuous-media storage servers.