SIAM Conference on Data Mining

Analyzing Medical Patient Data: Challenges, Results, and Future Directions


R. Bharat Rao
Department Head, Clinical CAD & Data Mining
Computer Aided Diagnosis & Therapy
Siemens Medical Solutions, Inc. USA
Malvern, PA 19355


The last century has seen an exponential increase in the accuracy and sensitivity of diagnostic tests: from observing external symptoms, to increasingly sophisticated laboratory tests, to complex imaging methods that permit detailed non-invasive internal examinations, to, in the very near future, the use of genomic and proteomic analysis. This improved accuracy has inevitably resulted in an exponential increase in the patient data available to the physician. Furthermore, medical knowledge is expanding, with physicians being bombarded with an choice of new diagnostic tests, clinical guidelines on how to diagnose and treat patients, and evidence-based results from clinical trials; all of which are regularly updated and modified. Both these trends – the increase in patient information and scientific knowledge – will only intensify, particularly with genomic and proteomic analysis soon becoming part of mainstream medicine.

There is a tremendous opportunity for data mining methods to assist the physician deal with this flood of patient information and scientific knowledge. Existing tools for computer-aided detection are aimed primarily at the radiologist – they help improve accuracy, by calling the radiologist’s attention to “actionable” structures in an image. These tools, although tremendously useful, represent the very tip of iceberg. Data mining and machine learning can potentially help all physicians (not just radiologists) in a variety of ways, by helping interpret complex diagnostic tests, by combining information from multiple sources (images, clinical data, proteomics, scientific knowledge), by providing support for differential diagnosis, by suggesting treatments, and providing patient-specific prognosis.

Another relevant trend that will increasingly dominate medicine in the next decades is the push towards “evidence-based” medicine (based largely upon the results of clinical trials and empirical evidence) as opposed to traditional experience-based medicine (based upon the individual physician’s knowledge and experiences). Due to the huge cost of clinical trials, it is increasingly important to mine the patient data already collected in institutions; however, this presents many difficulties due to the poor quality of such patient records.

This tutorial will describe the state of the art in computerized decision-support tools for medicine. To be successful, such tools need to not only be accurate, but also must be seamless blended into the physician’s normal workflow. We will then describe some of the performance challenges faced by data mining to develop this next generation of tools. The tutorial will also address some of the issues in mining patient data already collected in medical institutions, and will conclude with some thoughts about the future of computer-aided decision support in medicine.


The expected audience for this tutorial is KDD practitioners interested in biomedical informatics. The audience is not expected to be familiar with the area. However, the audience is expected to have basic knowledge in computer science and KDD.


Dr. R. Bharat Rao is the Head of the Clinical CAD and Data Mining R&D Department at the Computer-Aided Diagnosis and Therapy Solutions Group in Siemens Medical Solutions, Malvern, PA. He received his Ph.D. in machine learning from the Department of Electrical Engineering, University of Illinois, Urbana-Champaign, in 1993. Dr. Rao joined Siemens Corporate Research in 1993, and managed the Data Mining group over there since 1996, at which time he began his research in medical data mining. In 2002, he joined the then-formed Computer-Aided Diagnosis & Therapy Group in Siemens Medical Solutions, with a particular focus on using clinical patient information and data mining methods to help improve traditional computer-aided detection methods.

His current research interests (and of Clinical CAD & Data Mining R&D Department) are focused on the use of machine learning and probabilistic inference to develop decision-support tools that can help physicians improve the quality of patient care and their efficiency. He is particularly interested in the development of novel data mining methods to collectively mine and integrate the various parts of a patient record (lab tests, pharmacy, free text, images, proteomics, etc.) and the integration of medical knowledge into the mining process. Dr. Rao has published many papers on machine learning and data mining, and will be the Industrial/Government track Program Co-Chair at KDD-2004.