\chapter{Obstructive Sleep Apnea}
\label{chap:apnea}
\footnotetext{This chapter is available as tech report \#LA-UR-07-1969
from Los Alamos National Laboratory.}
\index{obstructive sleep apnea}
\index{apnea|see{obstructive sleep apnea}}%
The challenge for the \emph{Computers in Cardiology} %
\index{Computers in Cardiology 2000 (CINC2000)}%
meeting in 2000 (CINC2000) %
was a competition to identify \emph{obstructive sleep apnea} on the
basis of electrocardiograms alone. My colleague James
\index{McNames}{McNames} at Portland State University and I won the
prize for the best minute by minute analysis of data. I prepared our
first entry using HMMs, and James prepared the subsequent (and
winning) entries by hand using spectrograms\footnote{A spectrogram is
display of power in Fourier spectral bands as a function of time.
The x-axis is time, the y-axis is frequency, and the image intensity
at point $(t,f)$ is the power in that frequency estimated in a
window centered at that time.} to visualize the data.
\index{spectrogram} In preparing the first entry, I discovered that
using a sequence of states from the Viterbi algorithm to deduce a
sequence of classifications produced obvious errors. After some
thought, I realized that a variant of the Viterbi algorithm that
directly decoded sequences of \emph{classifications} rather than
sequences of \emph{states} would yield better classification results.
Although I did not implement such an algorithm in 2000, I have in
preparing this book.
While the more appropriate algorithm does not perform as well as
James' eye, it does perform well enough to place among the top scores
that appear on the \index{PhysioNet}{PhysioNet}
website\footnote{\url{http://www.physionet.org/challenge/2000/top-scores.shtml}}.
In this final chapter, I will describe the algorithm for decoding
sequences of classification and narrate in a less formal style of
presentation my attempt at using it to \emph{re-win} the contest.
\section{The Challenge and the Data}
\label{sec:challenge}
The PhysioNet
website\footnote{\url{http://www.physionet.org/challenge/2000}}
announced the challenge and described the data as follows:
{\it % italic to signify quote
{\rm \textbf{Introduction:}}
%
Obstructive sleep apnea (intermittent cessation of breathing) is a
common problem with major health implications, ranging from
excessive daytime drowsiness to serious cardiac arrhythmias.
Obstructive sleep apnea is associated with increased risks of high
blood pressure, myocardial infarction, and stroke, and with
increased mortality rates. Standard methods for detecting and
quantifying sleep apnea are based on respiration monitoring, which
often disturbs or interferes with sleep and is generally expensive.
A number of studies during the past 15 years have hinted at the
possibility of detecting sleep apnea using features of the
electrocardiogram. Such approaches are minimally intrusive,
inexpensive, and may be particularly well-suited for screening. The
major obstacle to use of such methods is that careful quantitative
comparisons of their accuracy against that of conventional
techniques for apnea detection have not been published.
We therefore offer a challenge to the biomedical research community:
demonstrate the efficacy of ECG-based methods for apnea detection
using a large, well-characterized, and representative set of data.
The goal of the contest is to stimulate effort and advance the state
of the art in this clinically significant problem, and to foster
both friendly competition and wide-ranging collaborations. We will
award prizes of US\$500 to the most successful entrant in each of two
events.
{\rm \textbf{Data for development and evaluation:}}
%
Data for this contest have kindly been provided by Dr.~Thomas
Penzel \index{Penzel, Thomas} of Philipps-University, Marburg,
Germany [available on the website].
The data to be used in the contest are divided into a learning set
and a test set of equal size. Each set consists of 35 recordings,
containing a single ECG signal digitized at 100 Hz with 12-bit
resolution, continuously for approximately 8 hours (individual
recordings vary in length from slightly less than 7 hours to nearly
10 hours). Each recording includes a set of reference annotations,
one for each minute of the recording, that indicate the presence or
absence of apnea during that minute. These reference annotations
were made by human experts on the basis of simultaneously recorded
respiration signals. Note that the reference annotations for the
test set will not be made available until the conclusion of the
contest. Eight of the recordings in the learning set include three
respiration signals (oronasal airflow measured using nasal
thermistors, and chest and abdominal respiratory effort measured
using inductive plethysmography) each digitized at 20 Hz, and an
oxygen saturation signal digitzed at 1 Hz. These additional signals
can be used as reference material to understand how the apnea
annotations were made, and to study the relationships between the
respiration and ECG signals. [...]
{\rm \textbf{Data classes:}}
%
For the purposes of this challenge, based on these varied criteria,
we have defined three classes of recordings:
\begin{description}
\item[Class A (Apnea):] These meet all criteria. Recordings in class
A contain at least one hour with an apnea index of 10 or more, and
at least 100 minutes with apnea during the recording. The learning
and test sets each contain 20 class A recordings.
\item[Class B (Borderline):] These meet some but not all of the
criteria. Recordings in class B contain at least one hour with an
apnea index of 5 or more, and between 5 and 99 minutes with apnea
during the recording. The learning and test sets each contain 5
class B recordings.
\item[Class C (Control):] These meet none of the criteria, and may
be considered normal. Recordings in class C contain fewer than 5
minutes with apnea during the recording. The learning and test
sets each contain 10 class C recordings.
\end{description}
{\rm \textbf{Events and scoring:}}
%
Each entrant may compete in one or both of the following events:
\begin{description}
\item[1. Apnea screening:] In this event, your task is to design
software that can classify the 35 test set recordings into class A
(apnea) and class C (control or normal) groups, using the ECG
signal to determine if significant sleep apnea is present. [...]
\item[2. Quantitative assessment of apnea:] In this event, your
software must generate a minute-by-minute annotation file for each
recording, in the same format as those provided with the learning
set, using the ECG signal to determine when sleep apnea occurs.
Your annotations will be compared with a set of reference
annotations to determine your score. Each annotation that matches
a reference annotation earns one point; thus the highest possible
score for this event will be approximately 16800 (480 annotations
in each of 35 records). It is important to understand that scores
approaching the maximum are very unlikely, since apnea assessment
can be very difficult even for human experts. Nevertheless, the
scores can be expected to provide a reasonable ranking of the
ability of the respective algorithms to mimic the decisions made
by human experts.
\end{description}
}%% end \it for quote
\subsection{The Data}
\label{sec:data}
Briefly, one can fetch the following records from PhysioNet:
\begin{description}
\item[a01-a20:] The \emph{a records} from individuals that display
\emph{apnea}
\item[b01-b05:] The \emph{b records}\footnote{The amplitude of the
\emph{b05} record varies dramatically over different segments of
time. I found it unusable and discarded it entirely.} from
individuals diagnosed as \emph{borderline}
\item[c01-c10:] The \emph{c records} from \emph{control} or normal
individuals
\item[a01er-a04er and b01er and c01er:] Identical to \emph{a01-a04}
and \emph{b01} and \emph{c01} except augmented with respiration and
$SpO_2$ (percent of arterial hemoglobin saturated with oxygen)
\nomenclature[rSpO]{$SpO_2$}{Percent of arterial hemoglobin
saturated with oxygen.} signals
\item[summary\_of\_training:] Expert classifications of each minute in
the $a$, $b$, and $c$ records
\item[x01-x35:] The test set\footnote{The records \emph{x33} and
\emph{x34} are so similar that I suspect they are simultaneous
recordings from different ECG leads. I did not explicitly exploit
the similarity in my analysis.}; records without classification
\end{description}
In 2000, using both the data and software to view it from PhysioNet, I
saw striking oscillations in the apnea time series. The patients stop
breathing for tens of seconds, gasp a few breaths, and stop again.
Each cycle takes about 45 seconds and can go on for most of the night.
I've plotted two periods of such an oscillation from record \emph{a03}
in Fig.~\ref{fig:a03erA}. The reference or \emph{expert}
classifications provided with the data indicate these are the last
oscillations in the first apnea episode of the night. Over the entire
night's record of 8 hours and 39 minutes, the patient had apnea for a
total of 4 hours and five minutes in 11 separate episodes. In that
time, more than once a minute, he was waking up enough to start
breathing. Ten and a half minutes after the end of the first apnea
episode, the record, as plotted in Fig.~\ref{fig:a03erN}, looks
normal.
\begin{figure}
\centering{
\resizebox{\textwidth}{!}{\includegraphics{figs/a03erA.pdf}}
}
\caption[A segment of record a03]%
{A segment of record a03. Two cycles of a large apnea
induced oscillation in $SpO_2$ are drawn in the lower plot. The
middle plot is the oronasal airflow signal, and the upper plot is
the ECG (units of both ONR and ECG are unknown). The time
axis is marked in \emph{hours:minutes}. Notice the increased
heart rate just after 0:58 and just before 0:59.}
\label{fig:a03erA}
\end{figure}
\begin{figure}
\centering{
\resizebox{\textwidth}{!}{\includegraphics{figs/a03erN.pdf}}
}
\caption[A segment of record a03]%
{A segment of record a03 taken during a period of normal
respiration. Signals the same as in Fig.~\ref{fig:a03erA}.}
\label{fig:a03erN}
\end{figure}
In plots like Fig.~\ref{fig:a03erA}, the heart rate visibly increases
at the end of the gasping phase and then decreases during the phase of
interrupted respiration. That heart rate oscillation is the key I
used in my initial attempts to classify periods of apnea. In
Fig.~\ref{fig:a03erHR}, I've plotted both a heart rate derived from
the ECG \nomenclature[rECG]{ECG}{Electrocardiogram} and the $SpO_2$
signal. The oscillations in the signals track each other, and the
expert classifies only the region of large heart rate oscillation as
apnea.
\begin{figure}
\centering{
\resizebox{\textwidth}{!}{\includegraphics{figs/a03erHR.pdf}}
}
\caption[A segment of record 03 at the end of an episode of apnea]%
{A segment of record 03 at the end of an episode of apnea with
indications in both the $SpO_2$ signal and the heart rate
\emph{(HR)} signal. The expert marked the time before 1:00 as
apnea and the time afterwards as normal.}
\label{fig:a03erHR}
\end{figure}
\section{First Classification Algorithms and Two Useful Features}
\label{sec:NVG}
\newcommand{\tmo}{i} % \tmo is t minus one
\newcommand{\tex}{j} % \tex is t exactly
\newcommand{\tpo}{k} % \tpo is t plus one
\subsection{Using Information from Experts to Train}
I first thought about the CINC2000 challenge when Andreas Rechtsteiner
told me that he was going to use it as a final project for a class
that James was teaching. I suggested that he try a two state HMM with
autoregressive observation models, \ie, a model like those described
in section~\ref{sec:ARVGaussian} but with scalar observations. The
performance of Andreas' two state model was not memorable.
To use the expert classification information in training, Andreas and
I found that we need only modify the observation model. The technique
is not limited to HMMs with only two states or two classes; it applies
to an arbitrary number of classes and to arbitrary numbers of states
associated with each class. At each time $t$, let $\ti{c}{t}$ denote
classification information from an expert about which states are
possible. Specifically $\ti{c}{t}$ is a vector that indicates that
some states are possible and that the others are impossible. We
simply replaced $P_{\ti{Y}{t}|\ti{S}{t},\ts{Y}{1}{t-1}}$ with
$P_{\ti{Y}{t},\ti{C}{t}|\ti{S}{t},\ts{Y}{1}{t-1}}$ wherever it occurs
in the Baum-Welch algorithm. Roughly, the modification forces the
system into states associated with the right class during training.
\subsection{Nonlinear Dynamics}
\label{sec:NLD}
After Andreas finished his class, I looked more carefully at heart
rate time series and the classification errors of a two state model.
I saw evidence of \emph{nonlinear dynamics} such as the period two
waveform and the sawtooth in Fig.~\ref{fig:ApneaNLD}. Superpositions
of sinusoids can describe such patterns with the right amplitudes and
relative phases of the harmonics. Although a linear model can favor
the correct frequencies and amplitudes of the component sinusoids,
linear models cannot detect or enforce phase relationships. The
ability to distinguish phase relationships makes it possible to build
nonlinear models that are more discriminating than any linear model.
\begin{figure}
\centering{
\resizebox{\textwidth}{!}{\includegraphics{figs/ApneaNLD.pdf}}
}
\caption[Nonlinear effects]%
{Nonlinear effects: The upper plot seems to be a period two
oscillation. The lower plot is approximately sawtooth.}
\label{fig:ApneaNLD}
\end{figure}
Excited by these observations, I built an HMM with many chains of
states. Each chain modeled a different form of heart rate
oscillation. I trained the model on the expertly classified records,
and then used the model to classify those same records. For each
record, I used the Viterbi algorithm to find the MAP state sequence
given the observations. Then I derived a classification sequence from
the decoded state sequence by selecting class $C$ at time $t$ if the
decoded state satisfied $\ti{s}{t}\in C$. The technique performed
terribly. The sequences of classifications said every minute was
\emph{normal} even though (actually \emph{because}) I had modeled the
different kinds of \emph{apnea} oscillations so carefully.
An analysis of the values of the conditional probabilities of states
given the entire sequence of observations, $P_{\ti{S}{t}|\ts{Y}{1}{T}}
\left(s_i|\ts{y}{1}{T} \right) = \alpha(s_i,t) \beta(s_i,t)$, revealed
the problem and suggested a solution. Because the model had many more
apnea states than normal states, the state probability was rarely
concentrated enough in a single apnea state for that state to be
decoded. To address the problem, I solved for the sequence of MAP
classifications, \ie,
\begin{equation}
\label{eq:smlc}
\ti{\tilde C}{t} \equiv \argmax_C \sum_{i:s_i\in C}
P_{\ti{S}{t}|\ts{Y}{1}{T}} \left(s_i|\ts{y}{1}{T} \right).
\end{equation}
instead of solving for the MAP sequence of states.
In general if there are more than two classes, Eqn.~\eqref{eq:smlc}
can yield unlikely or even impossible sequences of classes. (See the
example in Subsection~\ref{sec:sequenceMAP}.)
Section~\ref{sec:V4Class} describes an algorithm that rather than
finding the \emph{sequence MAP classifications} finds the \emph{MAP
sequence of classifications}, \ie,
\begin{equation}
\label{eq:mlsc}
{\ts{\hat C}{1}{T}} \equiv \argmax_{\ts{C}{1}{T} }
\sum_{\ts{s}{1}{T}:\ti{s}{t}\in \ti{C}{t}, 1 \leq t \leq T} P\left(
\ts{s}{1}{T} | \ts{y}{1}{T} \right).
\end{equation}
Using Eqn.~\eqref{eq:smlc} and the HMM with many chains of states, I
classified the test data to prepare a contest entry. When James
submitted the entry % on April 24, 2000
we found that it classified 78.91\% of the minutes correctly.
\subsection{The Excellent Eye of Dr.\ McNames}
\label{sec:mcnames}
To diagnose the errors, James printed spectrograms of every record.
Although the spectrograms discarded the phase information that I hoped
was important, they clearly indicated the oscillations that I had
tried to capture in my complicated HMMs as intense bands of power at
frequencies below 2 cpm (cycles per minute).%
\nomenclature[rcpm]{cpm}{Cycles per minute.}%
In addition to those low frequency oscillations, James noticed bands
of power between 10 and 20 cpm in the spectrograms (See
Fig.~\ref{fig:sgram}). That higher frequency power was evidence of
respiration. Using both of those features, he classified the test
data by hand. First he classified each entire record for \emph{event
1}. Since almost none of the minutes in the normal records are
apnea, he classified each minute in those records as normal. His
third attempt at \emph{event 2} was the best entry at the close of the
contest.
\begin{figure}
\centering{
\resizebox{\textwidth}{!}{\includegraphics{figs/sgram.pdf}}
}
\caption[Information about respiration in high
frequency phase variations]%
{Information about respiration in high frequency phase variations.
This is the $a11$ record roughly between minutes 40 and 225. The
upper plot is heart rate (bandpass filtered 0.09-3.66 cpm), the
middle plot is a spectrogram of the phase jitter in the heart
rate, and the lower plot is the expert classification. A single
band of spectral power between about 10 and 20 cpm without much
power below the band in the spectrogram indicates normal
respiration.}
\label{fig:sgram}
\end{figure}
The question that motivated the contest is ``Is the information in an
ECG alone sufficient to classify apnea?'' James work answered the
question affirmatively. For attempts to do the classification
automatically, his work suggests the following points:
\begin{enumerate}
\item Classify each entire record first. Then, using that
classification, classify each minute in the record.
\item Heart rate oscillations at about 1.3 cpm indicate apnea.
\item A clean phase jitter signal at about 14 cpm
indicates normal respiration.
\end{enumerate}
\section{Decoding Sequences of Classifications}
\label{sec:V4Class}
In this section, I present a modified Viterbi
algorithm that finds the MAP sequence of classifications given
a model and a sequence of observations. Before a brief discussion,
let me introduce the following notation and equations:
\subsubsection*{Notation:}
\begin{description}
\item[$S^*$] The set of state sequences of arbitrary length.
Likewise, $C^*$ is the set of classification sequences of arbitrary
length.
\item[$g:S^*\times C^* \mapsto \{0,1\}$:] A masking function,
$g(\ts{s}{1}{t},\ts{c}{1}{t}) = 1$ is equivalent to ``$\ts{s}{1}{t}$ is
consistent with $\ts{c}{1}{t}$''. The principal application is
\begin{equation*}
P_{} \left(\ti{y}{t},\ti{c}{t}|\ti{s}{t} \right) = P_{}
\left(\ti{y}{t}|\ti{s}{t} \right) g(\ti{s}{t},\ti{c}{t})
\end{equation*}
\item[$ \ts{\hat c}{1}{t} (c)$:] The best classification sequence ending
in $c$
\begin{equation*}
\ts{\hat c}{1}{t} (c) \equiv \argmax_{\ts{c}{1}{t}:\ti{c}{t}=c}
P\left(\ts{c}{1}{t}, \ts{y}{1}{t}\right)
\end{equation*}
\item[$\nu(t,c)$:] The \emph{utility} of the best sequence ending in $c$
\begin{equation*}
\nu(t,c) \equiv \log\left(P_{} \left(\ts{y}{1}{t},\ts{\hat c}{1}{t}(c)
\right) \right)
\end{equation*}
\item[$f(t+1,s,c')$:] The probability of the state $s$ at time $t+1$,
and output $y(t+1)$ given class $c'$ at time $t$, the best
classification sequence leading to $c'$, and all previous outputs
\begin{equation*}
f(t+1,s,c') \equiv P\left(s,\ti{y}{t+1} | y_1^{t}, \ts{\hat c}{1}{t}(c')\right)
\end{equation*}
\item[$\phi(t,s,c)$:] The probability of state $s$ at time $t$ given
the best classification sequence ending in $c$ at time $t$ and
$\ts{y}{1}{t}$.
\begin{equation*}
\phi(t,s,c) \equiv P_{} \left(s|\ts{y}{1}{t},\ts{\hat c}{1}{t}(c) \right)
\end{equation*}
\item[$c'\left(c,t+1\right)$:] The best $\ti{c}{t}$ given $\ti{c}{t+1}=c$
\begin{equation*}
c'\left(c,t+1\right) \equiv \argmax_{\bar c} P_{}
\left(\ts{y}{1}{t+1},\ts{\hat c}{1}{t}(\bar c),c \right)
\end{equation*}
\end{description}
%
\nomenclature[gf]{$\phi(t,s,c)$}{Used in discussing the Viterbi
algorithm for class sequences. The probability of state $s$ at time
$t$ given the best classification sequence ending in $c$ at time $t$
and $\ts{y}{1}{t}$.} % end nomenclature
\subsubsection*{Equations:}
\begin{align}
\ts{\hat c}{1}{t}(c) &= \argmax_{\ts{c}{1}{t}:\ti{c}{t}=c}
\sum_{\ts{s}{1}{t}} g(\ts{s}{1}{t},\ts{c}{1}{t})
P\left(\ts{s}{1}{t},\ts{y}{1}{t}\right) \\
\label{eq:Cat}
\ts{\hat c}{1}{t+1}(c) &= \left( \left[ \ts{\hat c}{1}{t} (c'(c,t+1))\right],c
\right) \\
\label{eq:f}
f(t+1,s,c') &= \sum_{s'}\phi(t,s',c') P(s|s')P(\ti{y}{t+1}|s)\\
\nu(t+1,c) &= \nu(t,c'(c,t+1)) +
\log\left(\sum_{s} g(s,c) f(t+1,s,c'(c,t+1)) \right) \\
\label{eq:phi}
\phi(t+1,s,c) & = \frac{f(t+1,s,c'(c,t+1))}
{\sum_{\bar s} g(\bar s,c) f(t+1,\bar s, c'(c,t+1))}
\end{align}
Equation \eqref{eq:Cat} says that the best classification sequence
ending in $c$ at time $t+1$ consists of $c$ concatenated with the best
sequence ending in $c'$ at time $t$, where $c'$ is the best
predecessor of $c$.
Equation \eqref{eq:f} follows from the model assumptions
\begin{equation*}
P(s|s',\ts{y}{1}{t},\ts{\hat c}{1}{t}(c')) =
P(s|s')
\end{equation*} and
\begin{equation*}
P(\ti{y}{t+1}|s,s',\ts{y}{1}{t},\ts{\hat c}{1}{t}(c')) = P(\ti{y}{t+1}|s)
\end{equation*}
and Bayes rule.
To derive Eqn~\eqref{eq:phi} use the following\footnote{I use the
following abbreviations in this derivation:
\begin{align*}
c'(c,t+1) & \Rightarrow c' \\
\ts{\hat c}{1}{t}(c') & \Rightarrow \ts{c}{1}{t} \\
\ts{\hat c}{1}{t+1}(c) & \Rightarrow \ts{c}{1}{t+1}.
\end{align*}
}
\begin{align}
f(t+1,s,c') &= \frac{P(s,y_1^{t+1},\ts{c}{1}{t})}{P(\ts{y}{1}{t},\ts{c}{1}{t})}\\
\sum_{\bar s} g(\bar s,c) f(t+1,\bar s, c'(c,t+1)) &= \frac{
P(y_1^{t+1},\ts{c}{1}{t+1})}{P(\ts{y}{1}{t},\ts{c}{1}{t})}\\
P(c|s,y_1^{t+1},\ts{c}{1}{t}) &= g(s,c) \\
&= \frac{ P(s,y_1^{t+1},\ts{c}{1}{t+1})}{P(s,y_1^{t+1},\ts{c}{1}{t})}
\end{align}
and note
\begin{align*}
\frac{f(t+1,s,c')}{\sum_{\bar s} g(\bar s,c) f(t+1,\bar s, c')}
P(c|s,y_1^{t+1},\ts{c}{1}{t}) &= \frac{P(s,y_1^{t+1},\ts{c}{1}{t})~ P(\ts{y}{1}{t},\ts{c}{1}{t})
~ P(s,y_1^{t+1},\ts{c}{1}{t+1})} {P(\ts{y}{1}{t},\ts{c}{1}{t}) ~ P(y_1^{t+1},\ts{c}{1}{t+1})
~ P(s,y_1^{t+1},\ts{c}{1}{t})}\\
&= P(s|y_1^{t+1},\ts{c}{1}{t+1}) \equiv \phi(t+1,s,c).
\end{align*}
\subsubsection*{Algorithm:}
The algorithm makes a single pass through the sequence of observations
going forward in time. To move from time $t$ to time $t+1$, consider
each possible classification $c_\text{next}$, and find the best
predecessor classification $c_\text{best}$ at time $t$. Then collect
and store the best classification sequence ending in that
classification, its log probability, and the conditional probabilities
of the hidden states given that classification sequence. At the last
time step $T$, each possible final classification $\ti{C}{T}$ will be
associated with the best sequence ending in that classification and
the log likelihood of that sequence. The algorithm simply selects the
sequence with the highest log likelihood. Figure~\ref{fig:viterbiC}
provides pseudo-code for the algorithm.
%
\begin{figure}[htbp]
\begin{center}
\newcommand{\tnext}{_{\text{next}}}
\newcommand{\told}{_{\text{old}}}
\newcommand{\tbest}{_{\text{best}}}
\newcommand{\ick}{\log \left(\sum_{s} g(s,c\tbest) f(t+1,s,c\tbest) \right)}
\fbox{
\begin{minipage}{\columnwidth}
\begin{tabbing}
XX\=XX\=XX\=XX\=XX\=XX\=XX\=XX\= \kill
Initialize: \> \+ \\
for each $c$\\ \> \+
$\nu\tnext (c) = \log \left( \sum_{s} g(s,C) P_{\ti{Y}{1},\ti{S}{1}}
\left(\ti{y}{1},s \right)\right)$ \\ \\ \< \- \< \-
Iterate: \> \+ \\
for $t$ from 1 to $T$\\ \> \+
Swap $\nu\tnext \leftrightarrow \nu\told$\\
for each $c\tnext$\\ \> \+
\\ \# Find best predecessor\\
$c\tbest = \argmax_{c\told}\left( \nu\told(c\told) + \ick \right)$ \\
\\ \# Update $\nu$\\
$\nu\tnext(c\tnext) =\,$ \= $\nu\told(c\tbest) + \ick$ \\ \> \-
\\ \# Update predecessor array\\
Predecessor[$c\tnext,t$] = $c\tbest$\\
\\ \# Update $\phi$\\
for $s$ in $c\tnext$\\ \> \+
Assign $\phi\tnext(s,c\tnext)$ using Eqn.~\eqref{eq:phi}\\ \\
\< \- \< \- \< \- %This stuff is for tabs \< \-
Backtrack: \> \+ \\
$\ts{c}{1}{t} = \ts{\hat c}{1}{t}(\bar c)$ , where $\bar c =
\argmax_c \nu\tnext(c)$ at $t=T$
\end{tabbing}
\end{minipage}
}
\caption[Pseudocode for the Viterbi
Algorithm for class sequences.]%
{Pseudocode for the Viterbi Algorithm for class sequences}
\label{fig:viterbiC}
\end{center}
\end{figure}
\section{Assembling the Pieces}
\label{sec:Pieces}
As a final example for this book, I chose to revisit the CINC2000
challenge by combining the following pieces:
\begin{description}
\item[Low frequency oscillations of heart rate:] Oscillations like
those before 1:00 in the upper plot of Fig.~\ref{fig:a03erHR}
indicate apnea.
\item[High frequency phase modulation:] Strong bands of spectral power
between 10 and 20 cpm indicate normal respiration.
\item[Two pass classification:] First classify each entire record.
Then classify each minute in the record using a model selected on
the basis of the first classification.
\item[Viterbi decoding of classification:] For a given model and
sequence of observations, use the algorithm of
Section~\ref{sec:V4Class} to solve for a sequence of
classifications.
\end{description}
\subsection{Extracting a Low Pass Filtered Heart Rate}
\label{sec:LPHR}
From the raw ECG, I derived two kinds of sequences of observations.
At each sample time the code extracts a scalar \emph{low pass heart
rate signal} that conveys information about the low frequency heart
rate oscillations and a \emph{respiration vector} that conveys
information about high frequency phase oscillations. I used the
following signal processing steps to derive the low pass heart rate
signal.
\begin{enumerate} % Summary of rr2hr.py
\item Clip the raw ECG signal. The ECG signals have a few time
intervals during which the values have extremely large magnitudes.
I suspect that the electrical leads were physically disturbed at
those times. I eliminated those useless values as follows:
\begin{enumerate}
\item For each second, find the lowest, $l$, and highest, $h$, value
of the ECG signal
\item Sort these two collections of extreme values
\item Find $X_l$ the largest value that is smaller than 90\% of the
$l$ values and define the lower bound\footnote{Since $X_l$ is
negative, the lower bound is even more negative, \ie, $B_l <
X_l$.} as $B_l=1.5*X_l$
\item Find $X_h$ the smallest value that is larger than 90\% of the
$h$ values and define the upper bound as $B_h=1.5*X_h$
\item For every second that has an ECG value outside of the range
$[B_l,B_h]$ set all ECG values in that minute to 0.
\end{enumerate} %ecg\_clean.py
\item Find the time of each heart beat by applying the PhysioNet
program \emph{wqrs} to the clipped ECG signal. I call these times
the \emph{Rtimes}.
\item Create a sequence of raw pairs [\emph{time, rr-interval}] where
\emph{time} is an \emph{Rtime} and \emph{rr-interval} is the time
delay to the next \emph{Rtime}. %see rr2hr.py
\item Cull the sequence of pairs by removing entries that have extreme
\emph{rr-interval} values. For various reasons, the code misses
some heart beats. Without culling, the missed beats introduce
impulsive noise into the heart rate signal that is at least twice as
large as the real signal.
% top = 1.25*sortedD[ti] bottom = 0.8*sortedD[bi]
\item Create a heart rate signal, $hr$, sampled uniformly at 2 Hz by
interpolating between times in the sequence of remaining pairs.
\item Subtract the mean heart rate from $hr$ and then use the FFT
twice to eliminate frequency components below 0.09 cpm and above
3.66 cpm. Call the resulting signal $hr_l$.
\item Save $hr_l$ sampled 10 times per minute.% rr2hr.py
\end{enumerate}
\subsection{Extracting Respiration Information}
\label{sec:RESP}
I also created a \emph{respiration} signal with a sample every tenth
of a minute where each sample is a vector with three components. As a
first step, I used the following procedure to derive
periodograms\footnote{Periodograms are calculated by applying the FFT
to a sample sequence and squaring the resulting coefficients. The
result is a Fourier transform of the autocorrelation of a finite
segment of a single sequence. Periodograms are often used, \eg, by
averaging, to estimate the Fourier \emph{power spectral density}
(PSD) as a function of frequency.} from the same \emph{Rtimes}
sequence I used for the \emph{low pass heart rate} signal:
\index{periodogram}
\begin{enumerate}
\item Read the \emph{Rtimes} data
\item Calculate a \emph{deviation} time series sampled uniformly at 2
Hz. The deviation is the difference between the actual value of an
\emph{Rtime} and the value that is midway between its nearest neighbors.
\item Calculate ten periodograms per minute using FFTs and Gaussian
windows with a support of 512 seconds and $\sigma=25$ seconds
\item Smooth each periodogram over frequencies
\item Normalize each smoothed periodogram to be a vector $F_j$
with unit length
\item Save the vector $F_j$ and the normalization factor $Z_j$
\end{enumerate}
I used \emph{Fisher linear discriminant analysis} to reduce the
information in each periodogram to two scalar values. To implement
the analysis, I collected the following three classes of periodogram
vectors: $C$, those drawn from \emph{c} records; $N$, those drawn from
\emph{a} records during minutes that are labeled \emph{normal}; and
$A$, those drawn from \emph{a} records during minutes that are labeled
\emph{apnea}.
A basis of Fisher linear discriminants $\mathbf{W}$ maximizes the
separation of classes compared to the spread within classes as
expressed by the objective function
\begin{equation*}
J(\mathbf{W}) \equiv
\frac{\left|\mathbf{W}\transpose \mathbf{S}_B \mathbf{W}\right|}
{\left|\mathbf{W}\transpose \mathbf{S}_W \mathbf{W}\right|},
\end{equation*}
where $\mathbf{S}_B$ is the \emph{between} class scatter and
$\mathbf{S}_W$ is the \emph{within} class scatter. My classes were
$C$, $N$, and $A$. Using the notation $E(j)=i$ to mean ``The minute
corresponding to periodogram $F_j$ was in class $i$.'', I can write my
calculation of $\mathbf{S}_B$ and $\mathbf{S}_W$ as follows:
\begin{align*}
m_i &= \frac{1}{n_i}\sum_{j:E(j)=i} F_j\\
\mathbf{S}_i &= \sum_{j:E(j)=i} (F_j-m_i)\transpose
(F_j-m_i)\\
\mathbf{S}_W &= \sum_i \mathbf{S}_i \\
n &= \sum_i n_i \\
m &= \frac{1}{n} \sum_i n_i m_i \\
\mathbf{S}_B &= \sum_i n_i (m_i - m) \times (m_i - m)\transpose.
\end{align*}
While the classic solution (See page 123 of Duda, Hart, and
Stork\cite{Duda01}) is to choose a basis of eigenvectors corresponding
to the largest eigenvalues for the generalized eigenvalue problem
\begin{equation*}
\mathbf{S}_B w_i = \lambda_i \mathbf{S}_W w_i,
\end{equation*}
I solved the simple eigenvalue problem
\begin{equation}
\label{eq:LDAEV}
Q \Lambda Q\transpose = (\mathbf{S}_W + 100 \id)^{-1} \mathbf{S}_B
\end{equation}
and chose as $\mathbf{W}$ the two eigenvectors (two columns of $Q$)
that correspond to the two largest eigenvalues (elements of
$\Lambda$). In Eq.~\eqref{eq:LDAEV} $100 \id$ is a regularization
term that roughly quenches the effect of small or zero eigenvalues of
$\mathbf{S}_W$. Every tenth of a minute, I recorded the three numbers
$(\mathbf{W} F_j, Z_j)$ as a \emph{respiration} vector. See results
of the analysis in Fig.~\ref{fig:LDA}.
%
\begin{figure}
\centering{
\resizebox{0.65\textwidth}{!}{\includegraphics{figs/LDA1.pdf}}
\resizebox{0.3\textwidth}{!}{\includegraphics{figs/LDA2.pdf}}
}
\caption[Linear discriminant analysis of phase
jitter periodograms]%
{Linear discriminant analysis of phase jitter periodograms. The
plot in the upper left, shows the following mean periodograms:
$\mu_C$, the mean for the \emph{c} records; $\mu_{N}$, the mean of
the minutes that the expert classified as normal in the $a$
records; and $\mu_{A}$, the mean of the minutes that the expert
classified as apnea in the $a$ records. The lower left plot shows
the basis vectors that result from the linear discriminant
analysis. Scatter plots of the three classes projected on the
basis $(v_1,v_2)$ appear on the right.}
\label{fig:LDA}
\end{figure}
\subsection{Classifying Records}
\label{sec:ClassRec}
My classification algorithm analyzes each record in two passes. On
the first pass I assign an entire record to one of the following
groups: \emph{High} $(H)$, \emph{Medium} $(M)$, or \emph{Low} $(L)$.
The goal is to have the \emph{a-records} assigned to group $H$ and the
\emph{c-records} assigned to group $L$. The $M$ group is a buffer to
avoid the error of either assigning an \emph{a-record} to $L$ or a
\emph{c-record} to $H$.
Initially I hoped to use a likelihood ratio test for the first pass.
After some testing of different model classes, I selected the
following two models:
\begin{description}
\item[mod\_A2:] A two state HMM with AR-4 models for the low pass
heart rate signal and simple Gaussian models for the respiration
vectors.%
\index{AR|see{autoregressive (AR)}}%
\index{autoregressive (AR)}%
\nomenclature[rAR]{AR-n}{A linear \emph{autoregressive}
time series model of order $n$. Such a model uses a linear
combination of the previous $n$ observations to calculate a mean
and then adds a random noise term to the mean to generate the next
observation.}
\item[mod\_C1:] A simple model with a single AR-4 model for the low
pass heart rate signal and a single Gaussian model for the
respiration vectors.
\end{description}
I trained mod\_A2 on all of the \emph{a-records} and trained mod\_C1
on all of the \emph{c-records}. When I applied the resulting
classifier to the training data, there were a few errors. To improve
the performance, I considered other signal characteristics. Since the
errors were on records for which the low pass heart rate oscillations
were small, I looked at characterizations of the amplitude of that
signal. I came up with the following statistic:
\begin{equation*}
R = \frac{S_{74}}{L_1},
\end{equation*}
where $S_{74}$ is the \emph{size} of the smallest waveform peak that is
larger than 74\% of the peaks in the record and $L_1$ is the average
of the absolute values of all the peaks in the record.
Using both the likelihood ratio and $R$, I was still unable to
correctly classify all of the \emph{a} and \emph{c} training records
with a straight line. I could let record $a06$ fall in the $L$ group
or let record $c08$ fall in the $H$ group. So I retrained the models
using augmented training data. In particular I simply used the
records that were near the decision boundary more than once. A
procedure called \emph{boosting}\cite{Freund96,Friedman2000} augments
training sets based on classification errors in a principled way. My
augmentation was ad hoc by comparison. The result (see
Fig.~\ref{fig:class1}) is a pair of statistics with which I could
classify all of the \emph{a-records} and \emph{c-records} correctly
using a line.
%
\begin{figure}[tbhp]
\centering{
\resizebox{\textwidth}{!}{\includegraphics{figs/class1.pdf}}
}
\caption[The first pass classifier]%
{The first pass classifier. I've plotted the location of each
record using the log likelihood ratio $llr$ and the ratio
statistic $R$. Records to the left of the line $2.39-\frac{llr}{2}$
are in the $L$ group. Records to the right of the line
$2.55-\frac{llr}{2}$ are in the $H$ group. And those in between are
in the $M$ group.}
\label{fig:class1}
\end{figure}
Notice that in Fig.~\ref{fig:class1} I have plotted \emph{all} of the
records, the training data and the test data. Perhaps an honest man
would not have looked at the test data until he'd built a classifier
based on the training data alone. In preparing the models and code
for this chapter I did look at the training data, but I did not look
at the expert classifications. Although I was trying to recreate the
restrictions of the contest, those restrictions seem a little weak in
view of the fact that James demonstrated that he could classify the
test data using the ECG alone. I looked at each of the eleven
\emph{x-records} in the $L$ group and none of them seemed to have
evidence of apnea.
\subsection{Model Topology and Training Data}
\label{sec:topology}
After choosing the statistic $R+\frac{llr}{2}$ for classifying records
in the first pass, I had to select model structures and training
strategies for the classifiers that would implement the second pass.
In a somewhat haphazard fashion\footnote{I selected the topology to
optimize the performance of flawed software. Following a disk
failure, I rebuilt the software and fixed the known flaws, but I did
not reexamine the model topology.}, I selected HMMs with the
topologies drawn in Fig.~\ref{fig:structure}. In the figure, each
state label has the form $C_{Ti[j]}$ with the following
interpretation:
\begin{description}
\item[$C$] The \emph{class} of the state: Normal $N$ or Apnea $A$
\item[$T$] The \emph{type} of state: Home $H$, isle $I$, or peak $P$.
I assigned special observation classes to the second state in each
peak group ($P_{*2}$ in Fig.~\ref{fig:structure}) and to the times
that typical peaks occur in the data. Those assignments forced the
training algorithm to fit the observation models of the $P$ states
to the characteristics of typical peaks in the data.
\item[$i$] An index to separate multiple instances of the same type
\item[$j$] An index used only for $P$ states that identifies the subtype
\end{description}
%
\begin{figure}
\centering{\plotsize%
\def\Azero{$A_H$}%
\def\Aone{$A_{I1}$}%
\def\Atwo{$A_{I2}$}%
\def\Athree{$A_{P11}$}%
\def\Afour{$A_{P12}$}%
\def\Afive{$A_{P21}$}%
\def\Asix{$A_{P22}$}%
\def\Nzero{$N_H$}%
\def\None{$N_{I1}$}%
\def\Ntwo{$N_{I2}$}%
\def\Nthree{$N_{I3}$}%
\def\Nfour{$N_{I4}$}%
\def\Nfive{$N_{P11}$}%
\def\Nsix{$N_{P12}$}%
\def\azero{$A_H$}%
\def\aone{$A_{I1}$}%
\def\atwo{$A_{P11}$}%
\def\athree{$A_{P12}$}%
\def\nzero{$N_H$}%
\def\none{$N_{I1}$}%
\def\ntwo{$N_{I2}$}%
\def\nthree{$N_{I3}$}%
\def\nfour{$N_{P11}$}%
\def\nfive{$N_{P12}$}%
\def\ModH{\large$Mod_H$}%
\def\ModL{\large$Mod_M$ and $Mod_L$}%
\input{figs/structure.pdf_t}
}
\caption[Structure of HMMs for minute by minute classification]%
{Structure of HMMs for minute by minute classification in the second
pass of my procedure. I used the structure on the left for those
records classified as $H$ on the first pass and the structure on
the right for those records classified as $M$ or $L$ on the first
pass.}
\label{fig:structure}
\end{figure}
Starting from random parameters, I trained each of the three models on
subsets of the available training data. For the \emph{low} model,
$Mod_L$, I used records with classification statistics below 2.39,
that is all of the $c$ records and record $b04$. For the
\emph{medium} model, $Mod_M$, I used records with classification
statistics between 2.23 and 3.0, that is records \emph{c07, c08, b04,
a06, b01,} and \emph{a11}. And for the \emph{high} model, $Mod_H$,
I used records with classification statistics above 2.55, that is all
of the $a$ records, \emph{b01, b03,} and record $b02$. I chose the
training sets after looking at Fig.~\ref{fig:class1}. I wanted
records in the training set of each of the three models that would be
like the test records to which the model would be applied.
\subsection{Tunable Parameters}
\label{sec:tune}
After training the three models, I combined them with the first pass
statistic and classified each minute of the training data with
following results:
\begin{description}
\item[Percent of training minutes correctly classified:] 78.87\%
\item[Number of normal minutes misclassified:] Of the 10,136 minutes
labeled normal by the expert, my code classified 3,162 or 31.2\% as
apnea.
\item[Number of apnea minutes misclassified:] Of the 6,452 minutes
labeled apnea by the expert, my code classified 343 or 5.3\% as
normal.
\end{description}
Looking at the numbers and wishing for a score above 90\%, I came up
with two more parameters to adjust. The first was a factor that acts
like a threshold adjustment. The idea was to favor normal
classifications just a little bit more in the hope that I could reduce
the number of misclassified normal minutes at the cost of a smaller
increase in misclassified apnea minutes. I implemented the adjustment
by multiplying all observation probabilities in \emph{normal} states
by a factor slightly more than one. The adjustment has the effect of
increasing the number of minutes classified as normal. If the factor
is less than one, it has the opposite effect. I called the factor
\emph{Fudge}.
The second factor I decided to adjust modifies the relative weighting
of the two components of the observations, \ie, the low pass heart
rate $y_{\text{hr}}$ and the respiration vector $y_{\text{resp}}$. In
section~\ref{sec:topology}, for the conditional probability of both
components given the state, I used the product of the conditional
probabilities,
\begin{equation*}
P(y|s) = P(y_{\text{hr}}|s) P(y_{\text{resp}}|s),
\end{equation*}
where $y\equiv\left( y_{\text{hr}} ,y_{\text{resp}} \right)$ denotes
the composite observation. To give more weight to the low pass heart
rate component of the model, I could raise that probability to a power
higher than one, \ie,
\begin{equation*}
P(y|s) = \left( P(y_{\text{hr}}|s) \right)^{\text{Pow}}
P(y_{\text{resp}}|s).
\end{equation*}
I've plotted a survey of the dependence of classification performance
on the adjustable parameters \emph{Pow} and \emph{Fudge} in
Fig.~\ref{fig:PFsurvey}. On the basis of such surveys, I selected the
following parameters:
\begin{center}
\begin{tabular}{l|lll}
& $mod_L$ & $mod_M$ & $mod_H$ \\
\hline
Fudge & 2.2 & 1.15 & 1.05 \\
Pow & 0.9 & 2.5 & 2.1
\end{tabular}
\end{center}
%
\begin{figure}
\centering{\plotsize%
\def\pow{Pow}%
\def\fudge{Fudge}%
\def\x#1{\mbox{\hbox{$#1$}}}%
\def\y#1{\mbox{\lower2.0ex\hbox{$#1$}}}%
\input{figs/PFsurvey.pdf_t}
}
\caption[The response of classification performance to changes in \emph{Pow} and \emph{Fudge}]%
{The response of classification performance to changes in \emph{Pow}
and \emph{Fudge}. I've plotted the performance of $Mod_H$ trained
and evaluated on the \emph{H} group of records. As described in
the text, \emph{Pow} governs the relative weighting of the low
pass heart rate signal to the respiration characteristics and
\emph{Fudge} is a bias for choosing the \emph{normal}
classification. The $Z$ axis is the fraction of minutes
classified correctly.}
\label{fig:PFsurvey}
\end{figure}
\subsection{Results}
\label{sec:results}
With tuned parameters, the model classified 88.24\% of the minutes in
the training data correctly (See Table~\ref{tab:result1} for more
detail.).
\begin{table}
\caption[Performance with tuned values of \emph{Fudge} and \emph{Pow} on training]%
{Performance with tuned values of \emph{Fudge} and
\emph{Pow} on training records. I've sorted the list in order of
how well the code classified of the minutes in each record. For
each record, the number in the column labeled $N\rightarrow A$ is
the number of minutes labeled as \emph{normal} by the expert that
the code labeled as \emph{apnea}. The interpretations of the
other columns are similar.}
\centering{\plotsize%
\begin{tabular}{|llllll|}
\hline
Record & $N\rightarrow N$ & $N\rightarrow A$ & $A\rightarrow N$ & $A\rightarrow A$ & \% Right \\
\hline
\rule{0pt}{2.0ex}%
a11 & 198 & 46 & 138 & 84 & 0.6052 \\
b02 & 255 & 169 & 14 & 79 & 0.6460 \\
a06 & 276 & 27 & 140 & 66 & 0.6719 \\
a08 & 197 & 114 & 24 & 165 & 0.7240 \\
b01 & 362 & 105 & 2 & 17 & 0.7798 \\
a07 & 96 & 93 & 12 & 309 & 0.7941 \\
a18 & 37 & 14 & 82 & 356 & 0.8037 \\
b03 & 296 & 71 & 13 & 60 & 0.8091 \\
a03 & 175 & 98 & 0 & 246 & 0.8112 \\
a20 & 184 & 10 & 78 & 237 & 0.8271 \\
a15 & 91 & 50 & 36 & 332 & 0.8310 \\
a05 & 147 & 30 & 44 & 232 & 0.8366 \\
a16 & 140 & 21 & 51 & 269 & 0.8503 \\
a13 & 213 & 38 & 28 & 215 & 0.8664 \\
a09 & 90 & 24 & 39 & 342 & 0.8727 \\
a10 & 404 & 13 & 49 & 50 & 0.8798 \\
a14 & 69 & 57 & 2 & 381 & 0.8841 \\
a17 & 302 & 24 & 32 & 126 & 0.8843 \\
a02 & 72 & 36 & 19 & 401 & 0.8958 \\
a19 & 289 & 8 & 30 & 174 & 0.9242 \\
a12 & 14 & 29 & 3 & 530 & 0.9444 \\
b04 & 418 & 0 & 10 & 0 & 0.9766 \\
a01 & 11 & 8 & 0 & 470 & 0.9836 \\
a04 & 35 & 4 & 2 & 451 & 0.9878 \\
c07 & 424 & 0 & 4 & 0 & 0.9907 \\
c05 & 462 & 0 & 3 & 0 & 0.9935 \\
c09 & 465 & 0 & 2 & 0 & 0.9957 \\
c10 & 429 & 0 & 1 & 0 & 0.9977 \\
c03 & 452 & 1 & 0 & 0 & 0.9978 \\
c06 & 466 & 0 & 1 & 0 & 0.9979 \\
c02 & 500 & 0 & 1 & 0 & 0.9980 \\
c01 & 483 & 0 & 0 & 0 & 1.0000 \\
c04 & 481 & 0 & 0 & 0 & 1.0000 \\
c08 & 513 & 0 & 0 & 0 & 1.0000 \\
\hline
\rule{0pt}{2.0ex}%
sum & 9046 & 1090 & 860 & 5592 & 0.8824\\
\hline
\end{tabular}}
\label{tab:result1}
\end{table}
Applying the same model to the test data yielded a score of 86.37\%
with 11,292 minutes classified as normal and 5,976 minutes classified
as apnea. Thus my code classified 34.61\% of the minutes in the test
data as apnea while the expert classified 38.96\% of the minutes in
the training data as apnea.
In my final attempt to improve the score, I adjusted the fudge factor
used by $mod_H$ so that the total fraction of minutes in the test data
classified as apnea matched the fraction of minutes in the training
set classified as apnea by the expert. Changing the fudge factor to
0.988 yielded a score of 86.95\% with 10,541 minutes classified as
normal and 6,727 or 38.96\% minutes classified as apnea. While I was
disappointed not to get a score with HMMs that would have won in 2000,
a look at Table~\ref{tab:cinc2000} verifies that the HMM performance
is among those \emph{top scores}.
\begin{table}
\caption[The scores described in this chapter]%
{Here are the scores described in this chapter interspersed
with the top scores from the CINC2000 website
(\url{http://www.physionet.org/challenge/2000/top-scores.shtml}).}
\centering{\plotsize%
\begin{tabular}{|l|p{25em}|l|}
\hline
Score & Entrant & Entries \\
\hline
\rule{0pt}{2.25ex}%
92.62 & J McNames, A Fraser, and A Rechtsteiner
Portland State University, Portland, OR, USA & 4 \\
92.30 & B Raymond, R Cayton, R Bates, and M Chappell
Birmingham Heartlands Hospital, Birmingham, UK & 8 \\
89.36 & P de Chazal, C Henehan, E Sheridan, R Reilly, P Nolan,
and M O'Malley
University College - Dublin, Ireland & 15 \\
87.56 & M Schrader, C Zywietz, V von Einem, B Widiger, G Joseph
Medical School Hannover, Hannover, Germany & 9 \\
87.30 & MR Jarvis and PP Mitra
Caltech, Pasadena, CA, USA & 3 \\
\hline
\rule{0pt}{2.25ex}%
86.95 & Second entry in this chapter. Adjust \emph{Fudge} to get
fraction of apnea minutes in the test records to match the fraction of
apnea minutes in the training records & \\
\hline
\rule{0pt}{2.25ex}%
86.24 & First entry in this chapter. Models and parameters tuned to
the training records. & \\
\hline
\rule{0pt}{2.25ex}%
85.63 & Z Shinar, A Baharav, and S Akselrod
Tel-Aviv University, Ramat-Aviv, Israel & 1 \\
85.54 & C Maier, M Bauch, and H Dickhaus
University of Heidelberg, Heilbronn, Germany & 5 \\
84.49 & JE Mietus, C-K Peng, and AL Goldberger
Beth Israel Deaconess Medical Center, Boston, MA, USA (unofficial
entry) & \\
\hline
\end{tabular}}
\label{tab:cinc2000}
\end{table}
%
\section{Classification Versus Estimation}
\label{sec:ClassVsEst}
In the early chapters of this book, I have emphasized choosing model
parameters to maximize the likelihood of the data, while the tweaks
and fudges I've used in this chapter are concerned with improving
classification performance rather than improving likelihood. While it
is true that if one had access to accurate probabilistic
characterizations of observations conditional on class membership the
best classifier would be a likelihood ratio classifier, it is not true
that without such characterizations the best approach to
classification is to estimate them. This point is made forcefully by
Vapnik\cite{Vapnik98} who says, ``one should solve the
[classification] problem directly and never solve a more general
problem as an intermediate step.''
%%% Local Variables:
%%% TeX-master: "main"
%%% eval: (load-file "hmmkeys.el")
%%% End: