Wednesday, November 2, 2016
Comparing Characteristics of Sporadic and Outbreak Associated Foodborne Illnesses United States 2004–2011 Volume 22 Number 7—July 2016 Emerging Infectious Disease journal CDC
Comparing Characteristics of Sporadic and Outbreak Associated Foodborne Illnesses United States 2004–2011 Volume 22 Number 7—July 2016 Emerging Infectious Disease journal CDC
Comparing Characteristics of Sporadic and Outbreak-Associated Foodborne Illnesses, United States, 20042011 - Volume 22, Number 7July 2016 - Emerging Infectious Disease journal - CDC
Volume 22, Number 7July 2016
Research
Comparing Characteristics of Sporadic and Outbreak-Associated Foodborne Illnesses, United States, 20042011
On This Page
- Methods
- Results
- Discussion
- Suggested Citation
Figures
- Figure 1
- Figure 2
- Figure 3
- Figure 4
Tables
- Table 1
- Table 2
Technical Appendicies
- Technical Appendix
Downloads
- PDF[2.36 MB - 8 pgs]
- RIS[TXT - 2 KB]
Eric D. Ebel, Michael S. Williams
, Dana Cole, Curtis C. Travis, Karl C. Klontz, Neal J. Golden, and Robert M. Hoekstra
Author affiliations: US Department of Agriculture District of Columbia, Washington, DC, USA (E.D. Ebel, M.S. Williams, N.J. Golden); Centers for Disease Control and Prevention, Atlanta, Georgia, USA (D. Cole, R.M. Hoekstra); Leidos Incorporated, Reston, Virginia, USA (C.C. Travis); Food and Drug Administration, College Park, Maryland, USA (K.C. Klontz)
Suggested citation for this article
Abstract
Outbreak data have been used to estimate the proportion of illnesses attributable to different foods. Applying outbreak-based attribution estimates to nonoutbreak foodborne illnesses requires an assumption of similar exposure pathways for outbreak and sporadic illnesses. This assumption cannot be tested, but other comparisons can assess its veracity. Our study compares demographic, clinical, temporal, and geographic characteristics of outbreak and sporadic illnesses from Campylobacter, Escherichia coli O157,Listeria, and Salmonella bacteria ascertained by the Foodborne Diseases Active Surveillance Network (FoodNet). Differences among FoodNet sites in outbreak and sporadic illnesses might reflect differences in surveillance practices. For Campylobacter, Listeria, and Escherichia coli O157, outbreak and sporadic illnesses are similar for severity, sex, and age. For Salmonella, outbreak and sporadic illnesses are similar for severity and sex. Nevertheless, the percentage of outbreak illnesses in the youngest age category was lower. Therefore, we do not reject the assumption that outbreak and sporadic illnesses are similar.
A previous study used outbreak data to determine the relative contributions of 17 different food commodities to the annual prevalence of foodborne illness in the United States (1). That work assumed that the exposure pathways of foodborne outbreak illnesses were representative of those pathways for all foodborne illnesses, including outbreak-associated and sporadic (nonoutbreak) illnesses. However, this assumption cannot be tested directly because the food sources of sporadic illnesses typically are unknowable. In fact, despite the availability of multiple cases and controls that might enable examination of the likelihood of illness for different foods consumed, the food sources of outbreaks are identified in only about one half of all foodborne disease outbreaks investigated (2).
In lieu of a direct comparison of exposure pathways between outbreak and sporadic foodborne illnesses, we compare selected demographic, clinical, temporal, and geographic characteristics of outbreak and sporadic cases of illness caused by Campylobacter, Escherichia coli O157, Listeria, and Salmonella bacteria by using data from the Foodborne Diseases Active Surveillance Network (FoodNet) for 20042011. Such an analysis is limited but still useful. Although similarities between outbreak and sporadic cases in terms of disease characteristics would not imply that these cases have identical food exposures, notable differences in disease characteristics might indicate differences in food exposures.
Methods
Data submitted to the Centers for Disease Control and Prevention (CDC) by public health personnel from each FoodNet site indicate whether a case of foodborne illness is an outbreak or nonoutbreak (sporadic) case. We aimed to determine whether differences exist in terms of 6 selected characteristics of outbreak cases of laboratory-confirmed Campylobacter, E. coli O157, Listeria, and Salmonella infection reported in FoodNet (3) during 20042011. The 6 characteristics examined were as follows: 1) the FoodNet site reporting the case; 2) the year in which a case occurred; 3) the season in which a case occurred; 4) the age of patient (generally, the difference between submission date and reported date of birth); 5) the sex of the patient; and 6) the hospitalization status of the patient (i.e., whether the patient was hospitalized within 7 days of specimen collection).
Since 2004, the FoodNet surveillance catchment area has been stable. The FoodNet sites were Connecticut, Georgia, Maryland, Minnesota, New Mexico, Oregon, Tennessee, and selected counties in California, Colorado, and New York. To ensure sufficient data, we determined quintiles for season and age groups. Because the data distributions differed between the pathogens, these quintiles were determined for each pathogen separately. Sex and hospitalization status were binary variables.
Other variables of potential interest, such as source of specimen (e.g., stool, blood, or urine), race, ethnicity, and international travel, were not included in the analysis because there were relatively high percentages of missing observations for some pathogens and because percentages were highly variable over time and across other variables in the analysis, possibly introducing an unknown amount of surveillance bias and limiting interpretation of results. For example, the fraction of cases for which information on international travel by the patient was missing ranged from 6% for E. coli O157 to 44% forCampylobacter. Similarly, the fraction of cases for which information on race was missing ranged from 7% for E. coli O157 to 26% for Campylobacter. Our summary descriptions and final models are based on the set of FoodNet case reports for which all 6 variables are complete. Missing values for certain variables are described in the Technical Appendix[PDF - 234 KB - 2 pages].
To complete the analysis of these characteristics, we used a 2-step approach for each of the 4 pathogens examined. First, we conducted random forest and boosted tree analyses (4,5) to gauge the relative importance of the 6 characteristics in distinguishing between outbreak and sporadic cases. Random forest analysis is a data classification algorithm that seeks the best combination of factors to explain an outcome variable (i.e., outbreak or sporadic case). Boosted tree analysis pertains to the use of regression techniques (e.g., mean square errors) for measuring the fit of the trees to the data. We created random collections of classification trees and averaged those trees by a measure of how well each tree fit the data.
For each pathogen, we trained random forest models on ?85% of the data; we used the remaining ?15% of the data to validate the models classifications of outbreak and sporadic cases. We used the G2 statistic (a modified Wilks statistic) to identify more and less important factors (6). In a stepwise fashion, we removed the least important factors to determine if model misclassification of outbreak status improved for the training dataset or the validation dataset. We stopped the model simplification whenever removal of a factor caused misclassification to worsen. Factors that were not eliminated were carried on to the next step.
The second step of the analysis was logistic regression modeling. We used stepwise model building routines (7) to examine all main effects and interactions among the factor levels (i.e., model parameters) explaining the fraction of cases that are outbreak-associated cases (i.e.,
where p is the probability of a case being an outbreak case and X is a matrix of the data with the number of rows equal to the number of cases and the number of columns equal to the total levels of explanatory variables considered). As a model identification guide, we used forward selection procedures and minimum Bayesian information criterion scoring (BIC) (8). BIC is a preferred selection criterion because it penalizes the inclusion of additional parameters more strongly than alternative statistics (e.g., Akaike information criteria) (8,9).
We selected the logistic regression models with the lowest BIC scores as the best models. We used visual assessments of the residuals and interactions to assess the adequacy of the methods and models.
Results
During the study period (20042011), <1% of Campylobacter infections reported in FoodNet were outbreak cases, but ?20% of E. coli O157 infections were outbreak cases. Outbreak cases represented ?5% of Listeria and Salmonella infections (Table 1).
Figure 1. Quintile categorization of season and age for persons with foodborne illness included in the analysis of Foodborne Diseases Active Surveillance Network (FoodNet) data, United States, 20042011.
Seasonal quintiles were similar across pathogens except for E. coli O157; the first season was longer compared with the other pathogens, extending from January through the end of May (Figure 1). Age quintiles, however, differed substantially across pathogens. For example, to capture 20% of the data for Listeria, the first quintile was defined as cases in patients who were 038 years of age. In contrast, the first quintile for Salmonella only extended to patients 3 years of age. For Listeria, the relatively narrow quintile range for persons 6080 years of age reflects the larger number of older persons among these cases. For the binary variables (sex and hospitalization), the frequency of male patients was ?50% among all FoodNet cases for the 4 pathogens, and the percentages hospitalized for Campylobacter, E. coli O157, Listeria, and Salmonella infections were 16%, 44%, 93%, and 29%, respectively.
A descriptive treatment of the data shows that the frequency of outbreak cases among all FoodNet cases varied more for FoodNet site, year, patient age, and season than for sex and hospitalization status for each pathogen (Table 2). Compared with the other pathogens, Listeria exhibited substantial frequency ranges for some characteristics. For example, the percentage of Listeria cases that were outbreak versus sporadic cases per year varied from 0% versus 100% during 20072009 to 30.6% versus 69.4% in 2011. Variability was difficult to determine for Campylobacter because of the low frequency of outbreak-associated cases.
In general, FoodNet sites in Georgia and California had smaller percentages of outbreak cases, whereas Oregon and Colorado had larger percentages. California had small outbreak case percentages for Campylobacter (0.1%) and E. coli O157 (1.5%), whereas Georgia had the smallest percentage among all sites for Listeria (0.0%) and Salmonella (2.6%). Colorado had the largest outbreak case percentage among all sites for Campylobacter (1.0%) and E. coli O157 (38.9%), whereas Oregon and New Mexico had the largest percentages for Salmonella (20.5%) and Listeria(34.9%), respectively.
For each pathogens random forest analysis, the G2 statistic was smallest for the binary variables (sex and hospitalization). Furthermore, misclassification errors for the training and validation datasets were not substantively changed whether the analysis included all 6 factors or excluded sex and hospitalization status. Consequently, sex and hospitalization status were not important for classifying outbreak and sporadic cases for any of the pathogens, and these factors were excluded from the logistic modeling step.
Figure 2. Patterns of the Bayesian information criterion (BIC) statistic as a function of the number of model parameters are shown for the four pathogens included in the analysis of Foodborne Diseases Active...
Figure 3. Residual plots relative to fitted estimates of outbreak-associated case frequency for the best-fitting models used in the analysis of Foodborne Diseases Active Surveillance Network (FoodNet) data, United States, 20042011. A)Campylobacter;...
Plots of the BIC statistic for increasingly complex models illustrate that its value decreases to a minimum and then increases for more complicated models (Figure 2). For Campylobacter, the minimum BIC corresponds to a model containing just the FoodNet site parameters. For E. coli O157 and Listeria, the minimum BIC corresponds to a model with 16 parameters (9 for FoodNet site and 7 for year, with 1 reference value for each factor included in the intercept term). For Salmonella, the minimum BIC corresponds to a model with 152 parameters that includes all 4 factors (24 parameters plus the reference intercept), the FoodNet site by year interactions (63 parameters), the year by season interactions (28 parameters), and the FoodNet site by season interactions (36 parameters). Residual plots of the best-fitting models demonstrate reasonable fit to the data (Figure 3). These plots illustrate that the studentized residuals ([observed frequency predicted frequency of outbreak-associated cases]/SE of predicted frequency) generally cluster within 3 SD of the mean.
Figure 4. Interaction plots from the best-fitting Salmonellalogistic regression model used in the analysis of Foodborne Diseases Active Surveillance Network (FoodNet) data, United States, 20042011. A) Year versus state; B) season versus...
Interaction plots from the best-fitting Salmonella model (Figure 4) illustrate the complex relationships between some model factors. For example, interaction plots demonstrated that, for some FoodNet sites (e.g., Oregon, California, and Minnesota), the estimated proportion of outbreak-associated cases can change substantially across years. Moreover, the directions of changes are inconsistent across the sites. For example, the peaks and troughs of Oregons proportions across years are nearly the opposite of Minnesotas pattern. Likewise, theSalmonella interaction plots demonstrated interactions between the seasonal quintile and both the surveillance year and the FoodNet site. In contrast, the patterns for the age quintiles are consistent across surveillance years. Nevertheless, the first age quintile (03 years of age) has a markedly lower proportion of outbreak-associated cases relative to the other age quintiles. This underrepresentation of outbreak-associated cases among the youngest age quintile drives the significance of the age parameter in the logistic regression model.
Discussion
If foodborne illness source attribution estimates are to be effectively used for food safety decision making and monitoring success of interventions, the data used to generate them must be collected in a systematic fashion over time. Foodborne outbreak surveillance data have been systematically collected since 1973 and provide direct links between human illnesses and food sources. Although other methods of source attribution (e.g., casecontrol studies) can provide relevant estimates for different target populations, these estimates are potentially expensive, logistically complex, and not routinely conducted in the United States. Moreover, estimated attributable fractions are based on associations between illnesses and exposures, not proof of causality. The possibility that attribution estimates from outbreaks might not be reliably generalized to the bulk of estimated foodborne illnesses is recognized (1). Nevertheless, we cannot assess directly the validity of outbreak-based attribution estimates for application to the broader population of foodborne illnesses. Consequently, this study assessed similarities and differences between outbreak and sporadic cases across various case characteristics. If the examined characteristics of outbreak and sporadic cases are different for these data, then the assumption of similar exposure pathways is less plausible. FoodNet is particularly well-suited for this analysis, because it is the only US foodborne disease surveillance system that actively ascertains laboratory-confirmed human infections and distinguishes those cases that are associated with detected outbreaks.
In our analysis, the probability of a case being outbreak-associated varied significantly across the FoodNet surveillance sites for all 4 pathogens studied. Uncertainty exists for the causes of variability in the number of ascertained cases across FoodNet sites (10<
download now