Advertisement
Research ArticlePulmonology Free access | 10.1172/JCI152088
1Department of Genetics and Genomic Sciences,
2Department of Environmental Medicine and Public Health,
3Institute for Exposomic Research, and
4Division of Allergy and Immunology, Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, New York, USA.
Address correspondence to: Supinda Bunyavanich or Gaurav Pandey, One Gustave Levy Place, Box #1498, New York, New York 10029, USA. Phone: 1.212.241.5548; Email: supinda@post.harvard.edu (SB). Phone: 1.212.659.8535; Email: gaurav.pandey@mssm.edu (GP).
Authorship note: YCL and HHLH contributed equally to this work.
Find articles by Li, Y. in: JCI | PubMed | Google Scholar |
1Department of Genetics and Genomic Sciences,
2Department of Environmental Medicine and Public Health,
3Institute for Exposomic Research, and
4Division of Allergy and Immunology, Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, New York, USA.
Address correspondence to: Supinda Bunyavanich or Gaurav Pandey, One Gustave Levy Place, Box #1498, New York, New York 10029, USA. Phone: 1.212.241.5548; Email: supinda@post.harvard.edu (SB). Phone: 1.212.659.8535; Email: gaurav.pandey@mssm.edu (GP).
Authorship note: YCL and HHLH contributed equally to this work.
Find articles by Hsu, H. in: JCI | PubMed | Google Scholar |
1Department of Genetics and Genomic Sciences,
2Department of Environmental Medicine and Public Health,
3Institute for Exposomic Research, and
4Division of Allergy and Immunology, Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, New York, USA.
Address correspondence to: Supinda Bunyavanich or Gaurav Pandey, One Gustave Levy Place, Box #1498, New York, New York 10029, USA. Phone: 1.212.241.5548; Email: supinda@post.harvard.edu (SB). Phone: 1.212.659.8535; Email: gaurav.pandey@mssm.edu (GP).
Authorship note: YCL and HHLH contributed equally to this work.
Find articles by Chun, Y. in: JCI | PubMed | Google Scholar
1Department of Genetics and Genomic Sciences,
2Department of Environmental Medicine and Public Health,
3Institute for Exposomic Research, and
4Division of Allergy and Immunology, Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, New York, USA.
Address correspondence to: Supinda Bunyavanich or Gaurav Pandey, One Gustave Levy Place, Box #1498, New York, New York 10029, USA. Phone: 1.212.241.5548; Email: supinda@post.harvard.edu (SB). Phone: 1.212.659.8535; Email: gaurav.pandey@mssm.edu (GP).
Authorship note: YCL and HHLH contributed equally to this work.
Find articles by Chiu, P. in: JCI | PubMed | Google Scholar
1Department of Genetics and Genomic Sciences,
2Department of Environmental Medicine and Public Health,
3Institute for Exposomic Research, and
4Division of Allergy and Immunology, Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, New York, USA.
Address correspondence to: Supinda Bunyavanich or Gaurav Pandey, One Gustave Levy Place, Box #1498, New York, New York 10029, USA. Phone: 1.212.241.5548; Email: supinda@post.harvard.edu (SB). Phone: 1.212.659.8535; Email: gaurav.pandey@mssm.edu (GP).
Authorship note: YCL and HHLH contributed equally to this work.
Find articles by Arditi, Z. in: JCI | PubMed | Google Scholar
1Department of Genetics and Genomic Sciences,
2Department of Environmental Medicine and Public Health,
3Institute for Exposomic Research, and
4Division of Allergy and Immunology, Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, New York, USA.
Address correspondence to: Supinda Bunyavanich or Gaurav Pandey, One Gustave Levy Place, Box #1498, New York, New York 10029, USA. Phone: 1.212.241.5548; Email: supinda@post.harvard.edu (SB). Phone: 1.212.659.8535; Email: gaurav.pandey@mssm.edu (GP).
Authorship note: YCL and HHLH contributed equally to this work.
Find articles by Claudio, L. in: JCI | PubMed | Google Scholar |
1Department of Genetics and Genomic Sciences,
2Department of Environmental Medicine and Public Health,
3Institute for Exposomic Research, and
4Division of Allergy and Immunology, Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, New York, USA.
Address correspondence to: Supinda Bunyavanich or Gaurav Pandey, One Gustave Levy Place, Box #1498, New York, New York 10029, USA. Phone: 1.212.241.5548; Email: supinda@post.harvard.edu (SB). Phone: 1.212.659.8535; Email: gaurav.pandey@mssm.edu (GP).
Authorship note: YCL and HHLH contributed equally to this work.
Find articles by Pandey, G. in: JCI | PubMed | Google Scholar |
1Department of Genetics and Genomic Sciences,
2Department of Environmental Medicine and Public Health,
3Institute for Exposomic Research, and
4Division of Allergy and Immunology, Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, New York, USA.
Address correspondence to: Supinda Bunyavanich or Gaurav Pandey, One Gustave Levy Place, Box #1498, New York, New York 10029, USA. Phone: 1.212.241.5548; Email: supinda@post.harvard.edu (SB). Phone: 1.212.659.8535; Email: gaurav.pandey@mssm.edu (GP).
Authorship note: YCL and HHLH contributed equally to this work.
Find articles by Bunyavanich, S. in: JCI | PubMed | Google Scholar |
Authorship note: YCL and HHLH contributed equally to this work.
Published October 5, 2021 - More info
Air pollution is a well-known contributor to asthma. Air toxics are hazardous air pollutants that cause or may cause serious health effects. Although individual air toxics have been associated with asthma, only a limited number of studies have specifically examined combinations of air toxics associated with the disease. We geocoded air toxic levels from the US National Air Toxics Assessment (NATA) to residential locations for participants of our AiRway in Asthma (ARIA) study. We then applied Data-driven ExposurE Profile extraction (DEEP), a machine learning–based method, to discover combinations of early-life air toxics associated with current use of daily asthma controller medication, lifetime emergency department visit for asthma, and lifetime overnight hospitalization for asthma. We discovered 20 multi–air toxic combinations and 18 single air toxics associated with at least 1 outcome. The multi–air toxic combinations included those containing acrylic acid, ethylidene dichloride, and hydroquinone, and they were significantly associated with asthma outcomes. Several air toxic members of the combinations would not have been identified by single air toxic analyses, supporting the use of machine learning–based methods designed to detect combinatorial effects. Our findings provide knowledge about air toxic combinations associated with childhood asthma.
Air toxics are hazardous air pollutants that cause or may cause serious health effects (1). They are well-established detriments to human respiratory health, especially for children (2–8). In particular, exposure to air toxics early in life predisposes children to asthma, one of the most prevalent diseases in this demographic group. Epidemiological studies have linked prenatal and early-life exposure to air toxics with childhood wheeze, asthma, and altered lung function (6–14).
Although air toxics are generally analyzed and regulated as individual chemicals (6), we are exposed to combinations of air toxics in ambient air. The specific combinations of individual air toxics that influence childhood asthma have not been studied adequately. Assessing the respiratory health effects of multiple air toxics is challenging for several reasons (7, 15). First, it is logistically difficult and expensive to collect detailed, individualized exposure data for multiple air toxics using personal or local monitoring. Additionally, there are limited statistical methods to parse the effects of mixtures where individual air toxics may contribute only slightly to an adverse outcome but have a different impact in combination with other air toxics (15). As a result, few studies have considered exposure to air toxic mixtures and their associations with children’s health, including asthma (8, 15–17).
Several studies linking air toxic mixtures and health outcomes, as well as a prior review of 57 studies that examined air pollutants and their health effects, reached no consensus on the ideal methods for multi-pollutant analyses (6, 7, 15, 16, 18). A key limitation of the studies reviewed is that most metrics assume pure additivity of the effects of multiple air toxics, without consideration of synergistic and/or antagonistic interactions. Because of these challenges, air toxic combinations that collectively influence childhood asthma remain suboptimally characterized. Furthermore, identifying air toxic combinations associated with health outcomes is also difficult because of the exponentially large number of combination subsets in a set of air toxics, i.e., 2N – 1 combinations in a set of N air toxics. Conventional statistical methods (19–24) and feature importance assessment using machine learning algorithms (16, 25–27) have not been effective for this task, since they generally assess the association of air toxics individually.
In this study, we hypothesized that exposure to combinations of air toxics during early life is associated with asthma outcomes in later childhood. These outcomes included current need for daily asthma controller medication, lifetime emergency room visit for asthma, and lifetime overnight hospitalization for asthma (Figure 1). Asthma-related medication use, emergency room visits, and hospitalizations are frequently studied asthma outcomes that reflect asthma severity, control, and health care usage (28–30). Although some studies have reported associations between particular air toxics and these asthma subphenotypes (9, 11, 12), none addressed our goal to identify combinations of air toxics from a large national assessment of air toxics associated with these asthma outcomes. We tested our hypothesis by geocoding levels of 125 air toxics from the US EPA’s NATA (31), one of the richest sources of multi–air toxic profiling across the United States, to the residential addresses of children with asthma from our ARIA study (32) to map each child’s exposure to air toxics during the first years of life. We addressed the challenges of combinatorial air toxic analysis by applying a machine learning–based algorithm called DEEP, which, to the best of our knowledge, is a novel method for this problem. DEEP uses the high-performing XGBoost (33) algorithm to identify air toxic combinations associated with health outcomes. The combinations identified using XGBoost were then adjusted for potential confounders, including age, sex, race and ethnicity, and family income, to identify early-life multi–air toxic combinations, statistical interactions within combinations, and demographic profiles associated with adverse asthma outcomes in later childhood. Our approach identified several combinations of air toxics associated with asthma.
Study overview. Exposure data for over a hundred air toxics from the US Environmental Protection Agency’s (EPA) National Air Toxic Assessment (NATA) database were geocoded to AiRway in Asthma (ARIA) cohort participants with mild to severe persistent asthma (n = 151), based on participants’ residential zip code. The Data-driven ExposurE Profile extraction (DEEP) method developed in this study was then applied to the air toxic data to identify multi–air toxic combinations associated with 3 childhood asthma outcomes: use of prescribed daily asthma controller medication, lifetime emergency department visit for asthma, and lifetime overnight hospitalization for asthma. In the first stage of DEEP, multi–air toxic combinations were identified via eXtreme Gradient Boosting (XGBoost) models consisting of decision trees. In the second stage, multivariable logistic regression models were used to identify air toxic combinations significantly associated with childhood asthma outcomes after adjustment for age, sex, race and ethnicity, and family income. (Some images in this figure were obtained from the open-source collection at https://www.flaticon.com and were made by Wanicon, Freepik, and flaticon.)
Characteristics of the study cohort. Table 1 shows the characteristics of the ARIA study (32) participants with asthma examined in this study. These 151 children with mild to severe persistent asthma were recruited from the Mount Sinai Health System, New York, New York, USA, with informed consent from their parents/guardians via an IRB-approved protocol. Participants had a mean age of 12 years (standard deviation 3.2 years) at the time of assessment and were of diverse self-identified racial and ethnic backgrounds (Table 1). Their asthma was generally not well controlled, with a mean score on the ACT (34) of 16.8 (maximum value 25 representing optimal control) and 96% of the cohort reporting regular use of a short-acting β-agonist rescue inhaler.
Children who used daily asthma controller medication (n = 84, 56%) were younger than those who did not (n = 65; P = 0.048). ICSs were used most frequently, both independently and in combination with LABAs. Children who had at least 1 lifetime emergency room visit for asthma (n = 103, 68%) were more likely to self-identify as Black or Latino, had lower (P = 0.03) ACT scores than their counterparts who had never required an emergency department visit for asthma (P = 9.54 × 10–3), and were more likely to be taking combination ICS/LABA as their daily asthma controller medication (P = 5.97 × 10–3). Children who had been hospitalized overnight for asthma in their lifetime (n = 51, 34%) had significantly lower FEV1% on spirometry (P = 0.04), and higher rates of ICS/LABA (P = 2.98 × 10–7) and leukotriene receptor antagonist (P = 3.08 × 10–4) use for daily asthma treatment, compared with the participants with asthma who had never been hospitalized overnight for asthma.
Air toxic characteristics. Ambient annual average concentrations for over a hundred air toxics based on emissions inventories and computer simulation models are publicly available for each US census tract in the EPA’s NATA database (31). We mapped the available toxic levels to the residential zip code for each child in our cohort. Ninety-four zip codes spanning 443 square miles across New York, New Jersey, and Connecticut were represented in this cohort. We used the closest calendar year of NATA data available subsequent to a child’s birth date. We retained only the air toxics whose levels were available for all the participants in the mapped data sets, yielding 125 air toxics for analysis.
DEEP-enabled identification of combinations of air toxics associated with childhood asthma. We then applied DEEP to identify air toxic combinations associated with each of the 3 childhood asthma outcomes, namely the need for daily asthma controller medication, lifetime emergency room visit for asthma, and lifetime overnight hospitalization for asthma. In the first analytical stage of DEEP (detailed in Methods), for each outcome, the full data set was randomly split 100 times into training and test sets in an 80:20 ratio. For each split, an XGBoost model consisting of 100 decision trees was learned from the training set and evaluated on the test set in terms of the area under the receiver operating characteristic (ROC) curve (AUC score; ref. 35).
In the second analytical stage of DEEP, we analyzed the combinations of toxics from the XGBoost models, identified as root-to-leaf paths in the constituent decision trees, for each outcome. Note that in some cases, a combination may consist of only 1 air toxic if it is sufficient to predict the outcome under consideration for a subset of the cohort, thus giving DEEP flexibility in discovery. Also, in cases of multiple air toxics in these combinations, their sequence of appearance on the path also indicates their relative order of relevance to the outcome being predicted. This is because variables closer to the root of a decision tree have higher predictive power than those closer to the leaves.
Next, the frequency of each combination was calculated as the number of models (out of 100) where the combination was included in at least 1 of the constituent trees. Candidate combinations were then identified as those with a frequency of at least 10. These combinations were then used in multivariable regression models to test their association with the asthma outcome of interest, while adjusting for age, sex, race and ethnicity, and income.
After the first XGBoost stage of DEEP, 689 profiles of air toxics across all the asthma outcomes were discovered. These sets included both individual air toxics and their combinations. In the second stage of DEEP, 359 of these sets were then found to be significantly associated (P ≤ 0.05) with the respective outcome. After multiple-hypothesis correction by the Benjamini-Hochberg procedure (36), 273 air toxic profiles were found to be significantly associated (FDR ≤ 0.05) with at least 1 of the 3 outcomes. Our goal was to identify air toxic combinations whose increased levels are associated with adverse asthma outcomes. Therefore, among the significantly associated combinations, we focused on groups that included air toxics with levels higher than threshold. Among these final determined combinations, 18 had 1 air toxic each (Figure 2), and 20 were multi–air toxic combinations (Figure 3).
Air toxics individually associated with childhood asthma outcomes after adjustment for age, sex, race and ethnicity, and family income in ARIA cohort participants with persistent asthma (n = 151). For each outcome and air toxic, the strength of the association is shown in terms of its odds ratio (OR), 95% confidence interval (CI), and false discovery rate (FDR). P values for individual air toxics were obtained from multivariable logistic regression models and then adjusted for multiple hypothesis testing using the Benjamini-Hochberg procedure, yielding FDR values.
Multi–air toxic combinations associated with childhood asthma outcomes after adjustment for age, sex, race and ethnicity, and family income in ARIA cohort participants with persistent asthma (n = 151). For each outcome and combination, the strength of the association is shown in terms of its odds ratio (OR), 95% confidence interval (CI), and false discovery rate (FDR). P values for multi–air toxic combinations were obtained from multivariable logistic regression models and then adjusted for multiple hypothesis testing using the Benjamini-Hochberg procedure, yielding FDR values.
Air toxic combinations associated with asthma outcomes. Twenty multi–air toxic combinations and 18 individual air toxics were found to be significantly associated with at least 1 of the 3 asthma outcomes. The medians and interquartile ranges (IQRs) of the exposure levels of the 34 air toxics included in these associations are shown in Table 2.
Air toxics identified by DEEP as significantly associated with at least 1 of the 3 asthma outcomes, either individually or in combination with other air toxics
Higher levels of 17 individual air toxics were significantly associated with worse asthma outcomes (Figure 2). ORs for these associations ranged from 1.56 to 2.65. Several of the identified toxics are established risk factors for childhood asthma, especially the chemicals previously categorized as halogenated, ketones, and ethers (8, 37–39). Among these, the air toxics most associated with the outcomes were acrylic acid (OR = 2.10), mercury compounds (OR = 2.65), and ethyl chloride (OR = 1.87), respectively. Acetamide, pentachlorophenol, and polychlorinated biphenyls were associated with more than 1 asthma outcome.
A major strength of DEEP is its ability to identify multi–air toxic combinations associated with health outcomes. Indeed, here DEEP revealed significant associations between higher exposure to 20 multi–air toxic combinations and the 3 asthma outcomes of interest (Figure 3). Among these, 19 combinations included 2 air toxics and 1 included 3. The associations of these combinations were generally stronger than those of the individual air toxics, with ORs ranging from 1.60 to 3.19 (Figure 3).
Notably, acrylic acid not only was the individual air toxic most strongly associated with daily controller medication (Figure 2) but also was the first (i.e., primary) member of 7 of the 9 multi–air toxic combinations associated with this outcome (Figure 3). Acrylic acid also appeared in 3 of the other 11 combinations associated with emergency room visit and overnight hospitalization for asthma (Figure 3), indicating that it is a major contributor to adverse asthma outcomes among children.
Three air toxic combinations were associated with lifetime emergency room visit for asthma, all with an OR of over 2 (Figure 3). Acetaldehyde, acrylamide, and acrylic acid were the primary exposures in these combinations, despite the fact they were not individually significantly associated with the outcome. Several other air toxics in these combinations, namely carbon disulfide and hydroquinone, were also not individually associated with this outcome. These findings highlight the main strength of DEEP, namely its ability to identify significant multi–air toxic combinations, whose constituent air toxics may not be individually associated with the health outcome of interest.
Among the 8 air toxic combinations associated with lifetime overnight hospitalization for asthma, 1,4-dioxane, carbonyl sulfide, ethylidene dichloride, hydrochloric acid, and hydroquinone were the primary exposures (Figure 3). Both ethylidene dichloride and hydroquinone appeared in 3 of these 8 combinations, indicating that these 2 chemicals may play a role in the development of poor asthma outcomes among children. Most other air toxics in these combinations (Figure 3) were largely not individually associated with this outcome (Figure 2), again supporting DEEP’s ability to identify multi–air toxic combinations that may not be inferred from single air toxic associations.
Effect sizes of multi–air toxic combinations may not be evident from the individual associations of their members. Some air toxics had relatively low effect sizes when assessed individually (Figure 2) compared with the larger ORs from combination analyses (Figure 3). For example, acrylic acid was associated with daily controller medication, with an OR of 2.10 as an individual air toxic (Figure 2), but the ORs of its combinations with dimethyl phthalate, 1,1,1-trichloroethane, ethyl chloride, acetophenone, and cobalt were higher (OR 2.16 to 3.19; Figure 3). Also, none of these 5 air toxics was individually associated with the outcome. Similarly, hexachlorobenzene was associated with daily controller medication with an OR of 2.03 (Figure 2), while simultaneous exposure to the combination of hexachlorobenzene and dimethyl phthalate identified by DEEP had an OR of 2.96 (Figure 3). This was despite the fact that there was no significant individual association between dimethyl phthalate and the outcome. For the pair of toluene and phosphorus, neither air toxic was individually associated with daily controller medication (Figure 2), but their combination was associated with the outcome with an OR of 1.81 (Figure 3).
Similar cases of combinatorial effects were also seen for lifetime emergency room visit for asthma. For example, simultaneous exposure to polychlorinated biphenyl, acetaldehyde, and carbon disulfide had 3.10-fold higher odds of the outcome (Figure 3), while polychlorinated biphenyl’s individual effect size was substantially lower (OR = 1.72; Figure 2). Similarly, the combination of acrylic acid and hydroquinone was significantly associated with emergency room visit with an OR of 2.73 (Figure 3), but neither was associated with the outcome individually (Figure 2).
We observed similar results for multi–air toxic combinations and lifetime overnight hospitalization for asthma. Exposure to hydroquinone was individually associated with this outcome with an OR of 1.79 (Figure 2), but in combination with ethylidene dichloride, the association was stronger (OR = 2.03; Figure 3). Similarly, carbonyl sulfide was not individually associated with this outcome (Figure 2), but it was the primary member in 2 of the multi–air toxic combinations found to be associated with overnight hospitalization (Figure 3).
In summary, the above comparison of the effect sizes of the individual air toxic (Figure 2) and multi–air toxic (Figure 3) associations demonstrated that combinations of air toxics had effects that were not fully explained by simply adding together the individual effects from their constituents. Overall, DEEP identified 34 air toxics associated with the asthma outcomes (Table 2), including 16 air toxics with significant effects only as members of combinations.
Statistical interactions among members of air toxic combinations. To assess potential synergy between members of air toxic combinations associated with asthma outcomes, we conducted statistical tests for interactions. Significant statistical interactions detected between air toxic members within the combinations are shown in Table 3. Acrylic acid was the primary air toxic (i.e., primary branch point in the decision tree) of all the combinations with significant statistical interactions. Although other combinations did not reveal significant interactions, such interactions remain possible given the limitations of statistical detection of interactions. Directed experimental work could be undertaken to test for additional interactions.
Air toxic combinations associated with asthma outcomes with statistically significant interactions between combination members
Representative air toxic combinations and demographic risk factors. Finally, one of the advantages of DEEP is that the trees constituting its underlying XGBoost models can be visualized and interpreted, which is difficult to do for several other machine learning methods. However, since it is difficult to simultaneously depict all the trees inferred by DEEP, we visualized sample trees that contained the most strongly associated multi–air toxic combination for each childhood asthma outcome. Sample decision trees inferred by DEEP for each of the outcomes are shown in Figures 4, 5, and 6, respectively. To provide an additional level of interpretation, we also compared the demographic characteristics (age, sex, race and ethnicity, and family income) of children exposed to each of these combinations with those of children who were not exposed (Tables 4, 5, and 6). Differences could suggest demographic risk factors that may increase a child’s exposure to these multi–air toxic combinations.
A sample decision tree learned by DEEP to predict daily asthma controller medication using NATA-derived air toxic data geocoded to patients (n = 149). Each node in the tree indicates the number of participants satisfying the air toxic decision path until that point and the percentage of participants with that outcome. The sample corresponding to each node is stratified into 2 subpopulations based on the air toxic and its threshold associated with the node. The multi–air toxic combination acrylic acid and cobalt compounds, which was most significantly associated with this outcome, is highlighted in red.
A sample decision tree learned by DEEP to predict lifetime emergency room visit for asthma from NATA-derived air toxic exposure data geocoded to each patient (n = 151). Each node in the tree indicates the number of participants satisfying the air toxic decision path until that point and the percentage of participants with that outcome. The sample corresponding to each node is stratified into 2 subpopulations based on the air toxic and its threshold associated with the node. The multi–air toxic combination acetaldehyde and carbon disulfide and polychlorinated biphenyls, which was most significantly associated with this outcome, is highlighted in red.
A sample decision tree learned by DEEP to predict lifetime overnight hospitalization for asthma from NATA-derived air toxic data geocoded to each participant (n = 151). Each node in the tree indicates the number of participants satisfying the air toxic decision path until that point and the percentage of patients with that outcome. The sample corresponding to each node is stratified into 2 subpopulations based on the air toxic and its threshold associated with the node. The multi–air toxic combination hydroquinone and ethylidene dichloride, which was most significantly associated with this outcome, is highlighted in red.
Demographic characteristics of children exposed and not exposed to the acrylic acid and cobalt compounds combination, which was associated with daily asthma controller medication
Demographic characteristics of children exposed and not exposed to the acetaldehyde and carbon disulfide and polychlorinated biphenyls combination, which was associated with lifetime emergency room visit for asthma
Demographic characteristics of children exposed and not exposed to the hydroquinone and ethylidene dichloride combination, which was associated with lifetime overnight hospitalization for asthma
Acrylic acid and cobalt compounds was the air toxic combination associated with daily controller medication use with the highest OR of 3.19 (Figures 3 and 4). Children exposed to this combination were older than those who were not exposed (P = 0.02; Table 4).
Acetaldehyde, carbon disulphide, and polychlorinated biphenyls was the air toxic combination most strongly associated with lifetime emergency room visit for asthma (OR = 3.10; Figure 3 and Figure 5). Children exposed to this combination were younger (P = 5.34 × 10–8; Table 5) and had lower family income than those who were not exposed (P = 0.019; Table 5). Exposed children were also less likely to be White (P = 0.0046; Table 5). These observations point to social disparities among these groups of children.
The most strongly associated combination for overnight hospitalization was hydroquinone and ethylidene dichloride (OR = 2.03; Figure 3 and Figure 6). Children exposed to this combination were younger (P = 0.00218; Table 6) and had lower family incomes (P = 8.26 × 10–5; Table 6) than those who were not exposed.
Our application of a machine learning–driven algorithm called DEEP to a cohort of children with mild to severe asthma identified several individual air toxics and combinations of air toxics, to which increased exposure during early life was associated with adverse asthma outcomes in later childhood. In particular, due to the unique ability of DEEP to examine air toxic combinations, we identified 16 air toxics that were found to be significantly associated with childhood asthma outcomes only in combination with other air toxics.
Many air toxics in the identified combinations, such as carbonyl sulfide, carbon disulfide, ethyl chloride, and ethylidene chloride, are similar in structure and have analogous formation, production, chemical fate, and chemical transport properties (40). Ten air toxics in the combinations contained chlorine, 3 included heavy metal compounds, and many were acidic chemicals. This aligns with prior literature implicating acidic chemicals, chlorinated chemicals, and heavy metal compounds as risk factors for asthma and asthma severity (8, 41–44). However, the biological mechanisms through which these combinations of air toxics can jointly affect respiratory health and asthma merit further study.
Among the air toxics individually associated with asthma outcomes (Figure 2), triethylamine was associated with increased overnight hospitalizations for asthma. Triethylamine is a clear, colorless liquid used in waterproofing and as a catalyst, corrosion inhibitor, and propellant (45). It is a respiratory irritant, to which chronic exposure even at low levels can inhibit the function of organic cationic transporters, thus preventing efficient uptake of inhaled bronchodilators used to control acute asthma symptoms (46, 47).
Acrylic acid was individually associated with daily controller medication (Figure 2) and appeared as a member of at least 1 combination associated with all 3 outcomes (Figure 3). Furthermore, it was found to interact with other member air toxics of 3 combinations (Table 3). Acrylic acid is used in the manufacture of adhesives, elastomers, plastics, and coatings, as well as floor paints and polishers (48). Literature has suggested that the presence of water-soluble cobalt complexes increases the conversion of polyacrylic acid into acrylic acid, which is more biologically viable. Acrylic acid also reacts with cobalt complexes to produce organocobalt complexes (49). Additionally, hydroquinone acts as a stabilizer to prevent the polymerization of acrylic acid, which keeps the latter in a form with a lower molecular weight that is more biologically viable (50, 51). Our results, including evidence of statistical interactions between acrylic acid and other chemical compounds, suggest further investigation of mechanisms for acrylic acid’s associations with adverse childhood asthma outcomes.
Ethyl chloride, also known as chloroethane (C2H5Cl), and ethylidene dichloride (C2H4Cl2) are both chlorinated hydrocarbons. Ethyl chloride is used as a thickening agent and binder in paints and cosmetics, refrigerant, aerosol spray propellant, anesthetic, and blowing agent for foam packaging (52). We found ethyl chloride to be associated with asthma outcomes, both as an individual air toxic (overnight hospitalizations; Figure 2) and as a member of multi–air toxic combinations (daily controller medication; Figure 3). Ethylidene dichloride, which is used mainly as a solvent for plastics, oils, and fats, and as a degreaser and fumigant in insecticide sprays (53), appeared as a member of several combinations associated with lifetime overnight hospitalization for asthma (Figure 3). Both these compounds are well-known members of the chloroethane family, which comprises liposoluble chemicals that can be taken up by the lipoprotein within the alveolar film layer (AFL) (54). AFL disruption is observed in multiple pulmonary diseases, including acute respiratory distress syndrome, infant respiratory distress syndrome, emphysema, chronic obstructive pulmonary disease, asthma, chronic bronchitis, pneumonia, pulmonary infections, and idiopathic pulmonary fibrosis (55). Thus, chronic exposure to ethyl chloride and ethylidene dichloride may lead to dysfunction in the AFL, which may contribute to worse asthma control.
Hydroquinone, a commonly studied air toxic, was also identified in our analyses. While exposure to higher levels of hydroquinone alone was not associated with overnight hospitalization for asthma, DEEP identified it as a member of several multi–air toxic combinations associated with this outcome (Figure 3). Hydroquinone is commonly found in the indoor environment, and exposure to it has been associated with airway hypersensitivity (56–59). Hydroquinone is widely seen in cosmetic and health products, including skin creams (60). It is thought to prevent the polymerization of acrylic acid, methyl methacrylate, cyanoacrylate, and other monomers that are susceptible to radical-initiated polymerization, thus allowing them to persist in their original form (50, 51). This suggests a mechanism through which the identified synergistic combination of hydroquinone with acrylic acid (emergency room visit; Figure 3) is associated with adverse asthma outcomes. Although our analysis did not find a statistically significant interaction between hydroquinone and ethylidene dichloride, potentially due to lower exposure levels, hydroquinone is industrially added to shelf ethylidene dichloride as a stabilizer (61). This suggests that in the presence of hydroquinone, ethylidene dichloride is less likely to react with other chemicals in the environment and thus retains its toxic form longer (similar to acrylic acid). Thus, it is still possible that hydroquinone and ethylidene dichloride may act synergistically, but this needs to be investigated in future studies.
Although our study has advanced the identification of air toxic combinations associated with childhood asthma outcomes, it also has limitations. We used the NATA model (31) to estimate exposures rather than personal sampling or local monitors. Collecting personal or locally monitored measures for 125 air toxics at each cohort participant’s residence would be logistically and financially challenging. Given this, NATA is commonly used for estimating ambient exposures, since it is a well-validated deterministic dispersion chemical transportation model created by the EPA that accounts for sources included in the NATA emission inventory (31). NATA estimates of a given air toxic may underreport a personally or locally monitored value, since the latter may include emissions from indoor and undocumented sources not in the EPA’s inventory. For instance, higher personally monitored benzene concentrations relative to NATA-predicted values are likely due, at least in part, to indoor sources not included in the EPA’s inventory (62). Other studies also found discrepancies between NATA estimates and monitored chemical concentrations, due again, in part, to local or indoor sources (63, 64).
Despite the above limitations, using NATA as the primary source of exposure estimates has several strengths over locally monitored values. First, NATA has a finer geographical prediction resolution and spread than currently available monitoring sites (31). This enabled us to include participants in our study who may not have had a monitoring site close to their residence. Second, NATA data are generated from an advanced chemical transportation model that aggregates exposure over a long period and thus is able to capture transient exposures. Also, several factors potentially affecting air toxic estimates, such as seasonality, ambient temperature, meteorology, precipitation, and solar radiation, have already been incorporated into NATA’s model (31). This level of comprehensive modeling is typically not available from personal or local monitoring. Finally, local measurements may also have detection and quantification limits, while NATA is able to estimate air toxics even at lower levels and over a longer period. Because of these strengths, we chose to study NATA data in this work.
We recognize that our results do not provide evidence of a causal effect of any chemical on adverse childhood asthma outcomes, which will need further investigation. Additionally, our geocoding was based on a single zip code for each participant and so would not account for potentially dynamic exposures due to residential moves. Last, our study participants were all from the New York, New Jersey, and Connecticut tristate area. Thus, our findings may not generalize to other US regions or parts of the world. Future studies could examine combinations from other geographical regions and/or utilize direct air sampling to confirm the combinations identified in this study.
In conclusion, this study demonstrated innovative use of data science methods and data sources to identify specific combinations of early-life air toxic exposures associated with later childhood asthma outcomes. Our study suggests that chemical pollutants should be closely monitored together in combination, especially in locations with vulnerable populations.
An overview of the study approach is shown in Figure 1.
Study population. The study population included children with asthma from the ARIA study, a cohort recruited from the Mount Sinai Health System, New York, New York, USA (32). The study was approved by the Mount Sinai IRB. Children with asthma had mild to severe persistent asthma according to National Asthma Education and Prevention Program/National Heart Lung Blood Institute Expert Panel Report 3 criteria (65) and positive bronchodilator response on spirometry or methacholine challenge, with provocative challenge causing a 20% fall in forced expiratory volume (PC20) < 12.5 mg/mL. Phenotyping for all participants included detailed questionnaires about asthma-related symptoms, medication and health care use, and pre- and post-bronchodilator spirometry following American Thoracic Society guidelines (65).
Asthma outcomes. We focused on the following 3 self-reported asthma-related outcomes: (a) current use of prescribed daily asthma controller medication (daily controller medication), (b) at least 1 emergency department visit for asthma during the patient’s lifetime (emergency room visit), and (c) at least 1 overnight hospitalization for asthma during the patient’s lifetime (overnight hospitalization).
Air toxic exposures. The air toxic exposure profile of each participant was derived from the EPA’s NATA (31). NATA estimates the annual ambient concentrations of over a hundred air toxics at each census tract in the United States based on emissions inventories and advanced computer simulation models (31). Seasonality of air toxics, ambient temperature, meteorology, precipitation, and solar radiation are incorporated into NATA’s model (31).
NATA data are available for 1996, 1999, 2002, 2005, 2011, and 2014 (27), and children in this study were born between 1997 and 2012. To assign the most representative ambient air toxic exposure levels to each participant, we mapped the residential zip code of each child in our cohort to the geometric centroid of the closest census tract (31). We then used the annual exposure data for that tract from the NATA release closest in time following the child’s birth year. This choice of year closest to birth was based on prior evidence that early-life exposure to air pollution is associated with childhood asthma outcomes (66–68). Finally, we retained the 125 air toxics that had data available, i.e., no missing data, for all participants in the final data set (15, 69).
Covariates. We included age, sex, race and ethnicity, and family income as covariates in multivariable regression models based on considerations that these variables could confound associations between air toxic levels and asthma outcomes. Since the questionnaire completed by ARIA participants did not include queries about family income, we used the average income of each participant’s residential zip code obtained from US Census Business Patterns data (70) as a surrogate for this variable.
Data-driven ExposurE Profile extraction (DEEP). To identify multi–air toxic exposure profiles associated with asthma outcomes, we developed a data-driven method called DEEP (Figure 1). DEEP is inspired by a simpler method that we previously used to identify multi–air toxic combinations associated with children’s cognitive skills (15).
In the first stage of DEEP, exposure combinations are identified using XGBoost (33), an algorithm that uses an ensemble method to iteratively learn decision trees. XGBoost generally yields strong predictive power (71, 72) due to its use of multiple optimization methods, including regularization and gradient boosting, which reduces overfitting of models to training data. Specifically, the full exposure data set was randomly split 100 times into training and test sets in an 80:20 ratio. For each split, an XGBoost model consisting of 100 decision trees was trained from the training set to predict the outcome under consideration. This model was then applied to and evaluated on the corresponding test set in terms of the AUC score (35). The overall predictability of the target outcome was evaluated in terms of the average value of the AUC scores across the 100 training/test splits.
The decision trees constituting each XGBoost model contain internal decision nodes, edges, and leaf nodes to represent how the value of an outcome could be predicted based on air toxic levels. Figures 4–6 show several trees derived in the current work. Each decision node in these trees contains an air toxic and a threshold value for its level. It is also connected by 2 edges representing the decisions made depending on whether an individual’s exposure was higher or lower than the threshold. Each of these edges is connected to either the next decision node or a leaf node. A leaf node determines the value of the outcome for the individual with the exposure profile represented by the decision path taken to reach it. Each decision and leaf node also represents a subpopulation of the cohort exposed to the air toxics on the path taken to reach it. Candidate multi–air toxic combinations are then defined as the air toxics and thresholds in the decision nodes constituting the paths from the root of a tree to the leaf nodes. We calculated the frequency of each combination as the number of XGBoost models (out of 100) where it was included in at least 1 of the constituent trees and set a threshold of at least 10 XGBoost models to identify the most relevant combinations. Note that, if 2 or more variables are highly correlated, and thus similarly associated with the outcome, a key characteristic of the decision trees in the XGBoost model is that they will include only 1 of these variables as an internal decision node. Thus, unlike traditional regression models, XGBoost is not as adversely affected by collinearity among the input variables. Furthermore, DEEP executes XGBoost 100 times on randomly selected training sets, and different selections of these variables may be included in the different trees inferred, thus enhancing the coverage of the air toxic profiles.
In the second stage of DEEP, a multivariable linear regression model is built to assess the association of a candidate combination with the target outcome, adjusted for covariates. The asthma outcome is the dependent variable in this model, while the air toxic combination and covariates are its independent variables. The variable representing the combination takes a value of 1 for individuals exposed to it, determined using the threshold values of the constituent air toxics, and 0 otherwise. One model is built for each outcome and candidate combination, yielding the OR denoting the strength of the association between the two. The P values of all the associations are converted into FDRs after correcting for multiple-hypothesis testing using the Benjamini-Hochberg method (36). In this study, significant associations were identified as those with FDR ≤ 0.05.
Assessment of synergistic interactions in air toxic combinations. To assess potential synergy between members of air toxic combinations associated with asthma outcomes, we conducted statistical tests for interactions. Interactions between pairs of air toxics were assessed through additional multivariable regression models where the outcome was the dependent variable and predictors included the levels of the 2 toxics, their product as a representative of their interaction, and covariates. For combinations with 2 air toxics, this regression model was inferred from the whole cohort, while for combinations with 3 air toxics, analyses were conducted for the last 2 toxics on the sample meeting the threshold for the first toxic level in the combination. A significant interaction was identified if the P value of the interaction term in the model was lower than 0.05.
The DEEP framework is implemented in the Python programming language (71). The XGBoost, model evaluation (AUC score calculation), and regression components are implemented using the xgboost (33), scikit-learn (72), and statsmodels (73) Python packages, respectively.
Statistics. Statistical comparisons of subjects stratified by outcome or exposure were performed using 2-sided Student’s t tests for continuous variables and χ2 tests for categorical variables. Statistics employed within DEEP and for air toxic interaction analyses were performed as described above.
Study approval. The study was approved by the Mount Sinai IRB. Parents of participants provided written informed consent.
HHLH, PHC, GP, and SB conceived the study. GP and SB supervised the work. YCL, HHLH, PHC, ZA, and YC managed and analyzed the data. YCL and HHLH drafted the manuscript. YCL, HHLH, YC, PHC, ZA, LC, GP, and SB reviewed, edited, and approved the manuscript. Order among co–first authors was determined based on contribution to results generation.
This work was supported by a pilot grant from the Department of Genetics and Genomic Sciences at Mount Sinai and NIH grants R01 AI118833, R01 HG011407, R01 HL147328, UG3 OD023337, and P30 ES023515. It was also supported in part through the computational resources provided by Scientific Computing at the Icahn School of Medicine at Mount Sinai. We thank Alfin Vicencio of the Mount Sinai Health System for his assistance with cohort recruitment and Jeanette Stingone of Columbia University for her technical advice.
Address correspondence to: Supinda Bunyavanich or Gaurav Pandey, One Gustave Levy Place, Box #1498, New York, New York 10029, USA. Phone: 1.212.241.5548; Email: supinda@post.harvard.edu (SB). Phone: 1.212.659.8535; Email: gaurav.pandey@mssm.edu (GP).
Conflict of interest: The authors have declared that no conflict of interest exists.
Copyright: © 2021, American Society for Clinical Investigation.
Reference information: J Clin Invest. 2021;131(22):e152088.https://doi.org/10.1172/JCI152088.