Relevance bias model for chemical risk assessment

June 27, 2011 at 8:02 pm | Posted in Feature Articles | 1 Comment

In the previous two months we have been looking at the risk assessment (RA) process, to see if it has features which help explain how RA can give a chemical such as BPA a relatively clean bill of health, yet the RA conclusions continue to be rejected by many of the researchers studying the toxicity of BPA.

First we looked at the mismatch between the needs of scientists involved in exploratory research compared with those of risk assessors trying to draw firm conclusions about a chemical’s safety (H&E #37). The mismatch helps explain why so few studies by academic researchers are included in risk assessments, even though they investigate the safety of the substances under question.

The research and regulation mismatch presents regulators with a dilemma: either change the way risk assessment is done so the findings of exploratory research can be included; or implement a programme to deliberately follow-up exploratory studies with the large-scale, carefully-documented studies risk assessors demand.

Second we looked at whether the risk assessment methodology itself might introduce biases into reviews of the safety of a substance like BPA (H&E #38). We explored the concept of “relevance for risk assessment”, central to how studies are selected for inclusion in a chemical RA.

We speculated that the demand for relevance might generate systematic bias in RA. If a relevance criterion leads to a consistent set of studies being excluded from RA, then consensus from different RAs about the safety of a substance may have less to do with the findings of studies than it has to do with how the studies are selected.

This month, we present an illustrative model of this potential bias.

Modelling the influence of relevance bias in risk assessment

For simplicity, we are imagining that an existing tolerable daily intake for a substance (TDI: the amount of the substance to which someone can regularly be exposed) may be too high and is therefore under review.

We will assume there are 56 studies in the published literature about the toxicity of substance X (figure 1). The majority (n=48, grey in colour) are ordinary peer-reviewed studies of sufficient quality to have been published in an academic journal.

Figure 1. 56 studies, 8 GLP (orange), 21 finding toxicity (circled in blue)

The remaining 8 (marked in orange) have been carried out according to OECD/GLP guidelines. The studies which indicate that substance X is harmful at the existing TDI, and that the TDI should therefore be lowered, are circled in blue. At this stage, we do not know which of the total of 56 studies are sound.

Method A, figures 2 & 3. In risk assessment, studies are evaluated for their relevance to calculating a tolerable daily intake (TDI) of chemical X. Few ordinary peer-reviewed studies meet the data requirements risk assessors have for this purpose (as we described in H&E #37). The GLP studies are designed to meet the data requirements and necessarily do so. Requiring relevance leaves us with 10 studies for evaluation of a TDI, circled in green (figure 2).

Figure 2. Studies circled in green meet relevance criteria of risk assessment.

Normally, however, risk assessors apply a second filter, reliability, to the studies. In this instance, we will assume, following EFSA’s guidance on safety assessment of pesticides (EFSA 2011) and the reported review methodology of the EU SCENIHR committee (Health and Consumer DG 2011), that following OECD/GLP guidelines is not a strong indicator of a study’s reliability.

In this case, the reliability assessment concludes that six studies (circled in red) out of the 12 relevant for risk assessment are reliable enough for calculating an accurate TDI (figure 3).

Figure 3. Method A. Reliability assessment leaves 6 studies for weight-of-evidence assessment.

A weight-of-evidence analysis is then carried out to determine whether or not the TDI should be changed. For the sake of simplicity of the model, each study carries equal weight of 1, with a +1 score for showing safety and -1 for showing harm.

Three studies show the TDI should be lowered and three show the TDI is fine as it is, giving a total weight-of-evidence of zero on this assessment. This means there is no clear case for changing the TDI, so the TDI stays the same – or as RA committees would likely put it: there is no reason for concluding that substance X is harmful, so long as exposure does not exceed the existing TDI.

Method B, Figures 2 & 4. If following GLP guidelines is taken as an indicator of reliability, as has been argued to be the case in EFSA and FDA risk assessments of BPA (Myers et al. 2009), then GLP studies are more likely to be included in the assessment. In this model, we assume the perceived reliability of GLP increases the total number of studies judged reliable to eight (circled in red, figure 4).

Figure 4. Method B. GLP is taken as an indicator of reliability, increasing the number of GLP studies in the RA by 2.

Of these studies, five show the TDI is acceptable and three suggest it should be changed. This leaves a total weight-of-evidence of +2 in favour of the safety of the existing TDI, which is evidence in favour of the TDI being correct.

Method C, Figure 5. It is possible to assess the evidence with none of the RA filters in place, a model which most closely resembles the approach of the Chapel Hill statement about the safety of BPA (vom Saal et al. 2007).

For this method for assessing potential harm from substance X, usefulness for calculating a TDI is not a factor in determining relevance for a weight-of-evidence assessment; the only factor which matters is if the study is of sufficient quality that its findings are highly likely to be true.

Here we assume that 16 of the 56 peer-reviewed studies here are found to be of high quality (circled in purple). 11 show toxicity and 5 do not, providing a weighting of -6 in favour of toxicity, which is strong evidence of potential harm (figure 5).

Figure 5. Method C. Discounting relevance to RA as a selection criterion could lead to a very different picture of the evidence.


This simplified model illustrates how the process of selecting studies for review can alter conclusions about the safety of a substance. It has the appearance of a variety of selection bias.

Although Method C provides strong evidence of potential harm it does not translate into a change in TDI because the studies do not provide data from which a TDI can be calculated.

Method A and Method B corroborate each other’s findings, to generate a consensus view that substance X is safe. However, that consensus is driven by the methods both applying similar exclusion criteria prior to assessing the validity of the remaining studies, thus leading to similar results in the weight-of-evidence assessment.

Ironically, in methods A and B it is the need for data which allows calculation of a TDI which leads to overestimation of the safety of substance X. This is the opposite of the intention of risk assessment, which is supposed to be conservative and err on the side of caution.

Obviously, weight-of-evidence analysis is more complex than portrayed, and this model cannot prove the existence of relevance bias, only anticipate (in broad brushstrokes) its potential effect. It serves as a reminder that consensus between reviews can be generated as much by application of similarly biased review methodology, leading to similar skewing of the results, as it can by objective analysis of the evidence.

Given the potential for relevance and reliability to alter which studies are selected for weight-of-evidence assessment, the possibility that the risk assessment process by its nature might produce biased results should be more closely examined.

There is also the question of what ought to be done if there is strong, consistent evidence of harm from studies which do not permit calculation of a TDI. Currently, bodies such as the European Food Safety Authority conclude in such circumstances that there is no reason to change the TDI. It might, however, be more prudent to conclude that a TDI cannot be calculated, or that there is reason for thinking the existing TDI is unreliable.

This has been noted in the minority opinion of EFSA’s assessment of the safety of BPA: “Due to methodological shortcomings, none of the new studies can be used to derive a more stringent NOAEL that could lead to a newly  established  numerical  TDI value. However, due to the overall weight of evidence, the current TDI of 50 µg/kg body weight may not be confirmed as a full TDI and should be considered as temporary.” (EFSA 2010)

If it is accepted that studies relevant for RA can be contradicted by a consensus view generated by research not intended for calculation of a TDI, then regulators are faced with the following dilemma: either

  1. regulate on the basis of consistent evidence of possible harm, even if this evidence does not permit calculation of a safe exposure level, or
  2. set up independent laboratories able to determine, according to regulatory data requirements and with the latest scientific techniques, what the actual safe level is – as opposed to waiting for a researcher outside the regulatory system to take it upon themselves to produce the data which the regulators need.

Nature vs. Nurture: How do we measure the effect of the environment on health? [Video]

June 14, 2011 at 2:00 pm | Posted in Video | Leave a comment

In epidemiology, “attributable risk” is the difference in rate of a condition between an exposed population and an unexposed population. Estimating what proportion of attributable risk is due to various environmental causes is challenging; in this video Stephen Rappaport presents an overview of the difficulties, such as different definitions of “environmental exposure”, and introduces the concept of the “exposome” as a conceptual tool for guiding ongoing research into the environmental causes of illness.

In this presentation, Paolo Vineis expands on the concept of the exposome as it applies to environmental causes of cancer. He looks at discrepancies in estimates of percentages of cancer caused by the environment, the importance of biomarkers in improving the accuracy of data for those estimates, which is currently only about 70% specific or lower.

This presentations were part of a workshop organised by the US Standing Committee on Emerging Science for Environmental Health Decisions. For more information visit their website; for updates on new workshops and other information, sign up for their newsletter.

The next Standing Committee workshops are on mixtures and cumulative risk assessment (27-28 July 2011, Washington DC, more information here) and toxicology and green chemistry (20-21 September 2011, more information available soon).

5&5: News and science highlights from May 2011

June 6, 2011 at 2:40 pm | Posted in News and Science Bulletins | Leave a comment


New research published in May found exposure to a combination of 3 pesticides increased risk of PD by 300%, whereas a combination of 2 only increased it by 80%.

Environmental Illness in U.S. Kids Cost $76.6 Billion in One Year: It cost a “staggering” $76.6 billion to cover the health expenses of American children who are ill because of exposure to toxic chemicals and air pollutants in 2008, according to new research by senior scientists at the Mount Sinai School of Medicine in New York.

A burning issue: A thorough and helpful explanation in Nature News about the significance of recent research finding widespread use of flame retardants in infant furniture, addressing concerns arising from stories run by newspapers such as USA Today.

Endocrine disruptor challenges for chemical legislation: May was an important month for the issue of endocrine disruption, a fast-growing priority for EU chemicals regulation, summarised by the EU news website Euractiv. SIN-List developers ChemSec added 22 EDCs to their list of over 300 chemicals which they argue fit the EU’s criteria for a regulatory decision on safety, while US research group TEDx published a list of 800 chemicals for which there is some evidence of ED properties. A summary of some of the difficulties legislators face with EDCs can be found here.

Hitting the Bottle: Comment piece in the New York Times voicing the concerns about the lack of safety testing of the alternatives to BPA, which companies such as US retail food chain Kroger are introducing in their efforts to eliminate their use of the oestrogenic packaging additive.

Impaired intellectual development in the young, Parkinson’s in the old: New additions to the body of research associating health problems with pesticide exposure were picked up by some media outlets. Writer and journalist Elizabeth Grossman described three studies of children which have produced consistent evidence of pesticides’ effects on cognitive skills and short-term memory. Researchers also found that mixtures of pesticides increase likelihood of Parkinson’s Disease by a greater degree than the sum total of increased risk of each taken on their own (the study can be found here).


An Evidence-Based Medicine Methodology To Bridge The Gap Between Clinical And Environmental Health Sciences. A new methodology using principles from systematic review in medicine to help evaluate the strength and quality of evidence that a chemical may harm health, and to support evidence-based decision making by clinicians and patients. The methodology itself is in the appendix to the paper.

Perinatal exposure to environmentally relevant levels of bisphenol A decreases fertility and fecundity in CD-1 mice: Extended fertility study showing that doses of high-purity BPA cause similar effects as DES. The authors found effects at 2,000x and 2x lower than EFSA/FDA’s standard for a safe dose but no effect at the middle dose, providing further evidence that dose-response curves for EDCs are unpredictable.

Endocrine disruptors in the etiology of type 2 diabetes mellitus Accessible and comprehensive review of the links between endocrine disruption and onset of T2 diabetes, concluding that: “Although more experimental work is necessary, evidence already exists to consider exposure to EDCs as a risk factor in the etiology of type 2 diabetes.”

Halogenated pollutants and obesity, diabetes: One study found that halogenated compounds based on BPA (such as the common flame retardant TBBPA) can interfere with the PPARγ receptor, of great interest to researchers because of its potential role in obesity. Epidemiologists reported that people living in areas with higher environmental levels of POPs were at higher risk of being admitted to hospital because of metabolic syndrome, while a different study found a persistent association between PCB 180, 163/164 levels and diabetes, even while controlling for lipid levels.

Perinatal Exposure to Bisphenol A at Reference Dose Predisposes Offspring to Metabolic Syndrome in Adult Rats on a High-Fat Diet: Animal study shedding light on the sometimes unpredictable ways in which chemicals can contribute to ill-health. In this case, a high-fat diet seemed to trigger the adverse metabolic effects of BPA, suggesting (as with diabetes) that it takes multiple factors, such as being obese, to be at greatest risk of harm from chemicals.

Blog at
Entries and comments feeds.

%d bloggers like this: