SIAM Conference on Data Mining

Top Ten Data Mining Mistakes -- and How to Avoid Them


John Elder, Ph.D.
Chief Scientist, Elder Research, Inc.


The tutorial will reveal the top mistakes we Data Miners can make, from the simple to the subtle, using case studies of real projects and the (often overlooked) symptoms that suggested something might be amiss. The goal will be to learn "best practices" from their flip side -- mistakes. (But we also should have time for brief summaries of how to do it right.) Mistakes to be covered: Lack of data, Focus on training, Reliance on one technique, Ask the wrong question, Listen (only) to the data, Accept leaks from the future, Discount pesky cases, Extrapolate (practically and theoretically), Answer every inquiry, Sample without care, Believe the best model.


The best background for attendees to have is a problem they want to solve and experience trying any analysis technique. We'll focus on how to think rightly about a problem, and not on technical equations or terms. The practical illustrations emphasize the "uncommon common sense" necessary to practice well the art of Data Mining.


Dr. John Elder heads a small Data Mining firm with offices in Charlottesville, Virginia, and Washington, DC. John earned degrees in Electrical Engineering at Rice University, then worked in the Defense consulting industry for 5 years, where he authored an early Data Mining tool for the Air Force which led to improved guidance and flight control applications. He then earned a Ph.D. in Systems Engineering from the University of Virginia while working as Director of Research for an investment management firm, and wrote an influential tool for global optimization. After two years post-doctoral research at Rice in the Computational and Applied Mathematics Department, John returned to Virginia and started Elder Research, Inc. in 1995, where he's led projects successfully applying Data Mining to a wide variety of financial, commercial, and medical applications -- including cross-selling, customer segmentation, direct marketing, credit scoring, sales forecasting, stock selection, drug efficacy, biometrics, market timing, and fraud detection. Dr. Elder has written several book chapters and articles on pattern discovery techniques, and is a frequently invited conference speaker. He is active on Statistical and Engineering journals and boards, and his popular Data Mining courses are acclaimed for clarity. He has been named to Who's Who in the World for his contributions to the field. Dr. Elder has been honored, since Fall 2001, to serve on Panel formed by Congress to guide critical defense technology for the National Security Agency.