Professor Hasofer writes: about
the lack of a clear specification of the alternative hypothesis in the WRR paper.
Although there is not an explicit statement of the alternative hypothesis in the WRR paper,
it is clear from the way the test statistic is formed that the alternative hypothesis is
that the distribution of distances between ELSs of the appellations and the dates is shifted
to the left, that is to smaller values. This alternative hypothesis is very different from what he suggests:
that "the encoder has actually put the appropriate dates nearest to each of the names according
to some distance measure."
Professor Simon writes:
"The probabilities quoted for the word clusters are computed by methods contrary to the
accepted laws of probability and used in situations where it is essentially impossible to
assign meaningful probabilities."
He explains this:
"Mr. Witztum’s calculation rely on multiplying together
lots of not so large numbers ... assuming independence ... in situations where independence is not a
valid assumption."
Indeed it is true that the statistic calculated by WRR involves the multiplication of fractions to
form the statistic. And it is true that this would be the right thing to do if there were independence.
For in that case the distribution of the statistic would be known and the Monte Carlo trials involved in
the WRR experiment would not be necessary. But the fractions being multiplied do not represent
probabilities of independent events and therefore the distribution of the product is not known and
this is the reason why the WRR experiment must use Monte Carlo trials to establish a p-value.
In the context of the Monte Carlo trials, the statistic formed is acting like a score function
and as such the results of the experiment do not involve any independence assumptions.