The Statistics of Gapped Sequence Alignments
Sequence comparison is indispensable to modern molecular biology. For example, biologists use the BLAST program more than once a second over the web to compare their query sequences to databases. If a query matches a database sequence of known function with a small p-value, the biological function of the query can be inferred. Presently, no on-line method can compute p-values to the accuracy the BLAST program requires, so sequence matches are restricted to certain pre-computed statistical parameters, to the detriment of certain types of database retrieval applications. Over the past two years, my group has reduced the simulation time required to estimate BLAST statistical parameters from about two days to about two seconds, making on-line estimation imminent. Our mathematical methods entail many interesting speculations.
John Spouge, National Center for Biotiechnology Information (NCBI) National Library of Medicine