One
fundamental rule of statistical analysis is "Decide what you're testing
before
you start, and stick with it." It's all too easy, partway through an
analysis, to see a pattern and decide to analyse that instead. Sets of data
are always throwing up patterns that¹s the nature of randomness. The only
analysis that counts is the one you set yourself before you see the data
otherwise you could draw all sorts of conclusions from random clumping of
data.
Even more important is "Don't have a preferred outcome." Sure, you
set down
a hypothesis what statisticians call the "Null Hypothesis" (i.e.
There is
no significant difference between this group and that group, or this
situation and that situation) but you have to be totally open about
whether the stats show your hypothesis to be true or false. Let the stats
show you the answer, rather than you trying to squeeze the answer you want
out of them.
That's where the phrase "Lies, damn lies and statistics" comes
from. With
the data in front of you, already collected and tabulated, it's always
possible to find a "clumping" of that data, looking at it one way
or
another, that proves what you want it to if you have an agenda. That's why
it's so important to define your null hypothesis blind, before you have the
data in front of you and not deviate from it.
The Essex Study
was a "between groups x between treatments" study. It was
looking at whether reactions of two different groups electrosensitives and
controls differed in terms of variation over three treatments GSM, UMTS
and Sham. The null hypothesis was effectively that there was no significant
difference between the variation in the responses of ESs to those three
treatments and the variation of the control group in response to those three
situations. If this hypothesis proved true then that would effectively show
that the ESs were no more aware of which exposure was which than the
controls were.
In the Essex Study twelve of the ES volunteers dropped out at an early stage
(It says in the Essex paper that they all dropped out after the first
session (see p.5); I'm advised by one participant that in his case, at
least, he has documentary evidence from Essex of his participation in two
sessions before dropping out through severe ill-health reaction). The study
report records that the primary reason for these twelve dropping out was
poor health; in the words of a number of those who withdrew, this means
severe adverse reactions to exposure. It's therefore arguable that the most
significant 20% of the ES sample group were eliminated from the study, never
to appear in the analysis statistics. This would have the effect of
seriously reducing the chances of getting a result that differed
significantly from the null hypothesis.
Nevertheless initial analysis of the figures, in accordance with that
hypothesis, did show a significantly stronger reaction by the ES
participants than the control group, in respect of UMTS radiation. That null
hypothesis was thus shown to be false. It should be noted that this was not
only despite the "top-slicing" of the ES sample of its possibly 12
strongest
cases, but also despite the fact that the ES sample itself was only 1/3 the
size of sample properly needed to ensure the required sensitivity of the
stats involved (Essex report p.10) a smaller sample means less chance of a
significant result being spotted.
So the Analysis of Variance (ANOVA) tests showed the hypothesis to be false
there was a significant difference between how reactions of controls
varied with exposure type and how reactions of ES candidates varied with
exposure type. The straightforward conclusion from this is that
electrosensitives are feeling something that the control group aren't.
But then something else cuts in. It's found that the computer-generated
schedule has given nearly half the ES sample their UMTS exposure during the
first long session. It's then reasoned that, of course, ES sufferers would
be more nervous in that first long session but of course would have settled
down by the second. Hence the stronger reactions.
It's exactly this sort of "post-hoc rationalisation" that can upset
the
applecart in respect of statistical analysis. If you start casting around,
after the event, for some other reason why it happened that way rather than
just going with your original hypothesis, you'll always find something,
somewhere in the stats.
It's also fair to observe that that line of reasoning is arguably totally
false. We have here a group of people who believe rightly or wrongly -
that they are adversely affected by this radiation. In the first long
session many experienced for whatever reason a significant reaction. The
Essex report (pp18, 19) says:
"It is not surprising that sensitive individuals would be more
anxious in
the first of the double-blind sessions, given the degree of uncertainty
they
may have felt in not knowing how the signal would affect them."
So having experienced a significant reaction
in the first long session they
will be more relaxed in the second session because they know what to
expect. ???
[It's perhaps also worth noting that "anxiety" wasn't the condition
flagged
at a significant level by those electrosensitives.]
Having decided on verifying a reason for this anomaly, other than the one
laid out in their original brief, the Essex team then conducted an analysis
of variance on each of the three sessions, comparing ES and control
reactions to variation between UMTS and Sham. Since they didn't know at the
outset how the computer would distribute the tests (with a large number of
ESs getting UMTS in the first long session) this clearly wasn't part of the
original plan it's an add-on.
Not surprisingly in Session 2 (the first long session) the figures for the
ESs come out: (a) larger than those for controls, and (b) larger for UMTS
than for Sham (Table 3, p.25). The overall difference between ESs and
controls even shows up as significant at 5% level. However, we're now down
to worryingly small group sizes: only 32 ES overall (experiencing Sham &
UMTS) and only 12 in the "Sham" exposure group for that first long
session.
Compare those figures with the numbers on page 10 of the report recommended
for this sort of 2-way analysis.
The effect of a smaller group size is to reduce considerably the
significance of the result found. With a smaller group there has to be a
larger error margin, so the figure to show a significant result has to be
quite a bit bigger. That's exactly why a test of this sort needs a decent
sample size to prove anything conclusively. And the decision of the Essex
team to shift the goalposts to "Is there any significant effect within
each
session?" effectively leads them to base their final conclusion on very
small group sizes with a correspondingly small chance of finding a
significant result.
If a result is reached on a test hypothesis that is unexpected, but there
are one or more factors which may have a bearing on that result (as was
considered to be the case here), the normal practice is to declare the
result but also to refer to "confounding factors" which may have
influenced
that result. It's also normal to be cautious, in scientific studies, in
presenting what this one study may or may not have shown.
The Essex Press Release proclaims "Study finds health symptoms
aren't linked
to mast emissions."
On the basis of the way the results of that study were handled, including a
subjective view of how electro-sensitives must have been thinking and a
decision to discount a significant result in the light of a
"post-hoc" analysis involving smaller sets of data, that claim is
open to question.
Professor Elaine Fox, leader of the Essex team, has been quoted in the media
as saying:
"Belief is a very powerful thing."
How right she is.
Dr. Grahame Blackwell 27th July 2007 |