Statistics is mainly about using the observed data to make conclusions about the "real state of affairs", underlying that data. A classical and most widely spread technique for making these conclusions is based on significance testing. In simple terms, the idea of significance testing is to ask the question: "if the real state of affairs were X, how probable would it be for us to obtain the data D we are currently observing?". If the answer is "rather unprobable" (e.g. p < 0.05), the common decision is to reject the proposition X in favor of the alternative "not X". Otherwise the researcher claims to "see no reason to reject X".
The logic behind that reasoning seems quite solid from the first glance, yet it is well known to be faulty. Naturally, the fact that the likelihood of the data P(D | X) is low need not imply that the underlying hypothesis is wrong - it might very well be the case that the data by itself is already rare enough to make this value low. The only correct way of making sound judgments is to consider the a-posteriori probability of the hypothesis P(X | D) instead. However, the latter can be quite inconvenient to compute. Besides, the wild popularity of significance tests and p-values seems to indicate that the issue is not at all that serious. Really, P(X | D) looks so similar to P(D | X), who cares?
The book "What If There Were No Significance Tests?", which I stumbled upon recently while browsing a stray library shelf, makes it clear that this issue is not a joke. It is a collection of chapters written by renowned statisticians (most of which some-why work in the field of psychology), that quite convincingly condemns the widespread overuse of p-values and the related significance-based hypothesis testing in favor of other approaches. The main point is nailed quite precisely in the very first essay by Jacob Cohen, which I strongly advise you to read right now in order to get rid of any illusions you might still have regarding significance testing. And when you're done with that, you can continue reading this post.
In the following I shall provide my personal summary of the marvelous "Member of Congress" example from J.Cohen's essay. So far it is the best illustration I know of, about why exactly it is dangerous to use significance tests blindly.
Improbable does not mean impossible
Consider the following situation. We have observed a person which we know to be a Member of the US Congress. We are interested in testing the hypothesis, that this person is an American citizen. To apply the significance testing methodology, we proceed by estimating the p-value:
P(Congressman | American) ~ 535/300 000 000.
This is clearly below the popular 0.05 threshold. As a result, we are forced to reject the null-hypothesis and conclude that the person is not an American citizen. Bummer.
What is the problem here? Well, one thing is worth noting - while the probability for an American to be a congressman is low, it is even lower (precisely, zero), for a non-American. So maybe we would have been better off if we expanded the procedure above to the following "maximum-likelihood"-style reasoning:
Considering that the likelihood P(Congressman | American) is greater than the likelihood P(Congressman | non-American), we must conclude that the person in front of us is an American rather than not.
Did we just solve the problem? Is it enough to consider "p-values both ways" to clear things up? No!
Maximum likelihood does not work
Let us now consider a reversed situation. We are faced with a person, which, we know, is an American. We are interested in the hypothesis that he is a congressman. Compute the two likelihoods:
P(American | Congressman) = 1
P(American | not Congressman) ~ 300 000 000 / 6 700 000 000
Observing that the first likelihood is greater than the second we are forced to conclude that the person in front of us is indeed a congressman. Bummer, again!
Only by multiplying the likelihood with the marginal probability P(Congressman) could we have obtained the correct decision. Which is, to say, we must have been estimating the probabilities the other way around from the start.
To summarize, be wary of these pitfalls. I would not agree with the strong negative opinion of the authors of the book, though. After all, a lot of stuff is quite fruitfully done nowadays using p-values only. However, each time you use them, do it sensibly and keep in mind the following two aspects:
- If your p-value is low, can this be solely due to low marginal probability of the data? What is the "reversed" p-value? What is the power of your test?
- If you suspect that your hypotheses might be subject to a highly non-uniform prior probabilities, do not use bare p-values. You must consider the prior!

and are faced with a task of devising a generic function of the sample, which could only depend on the values in the sample, but not on the ordering of these values. Alternatively, you might need to prove that a given statistic is constant with respect to all permutations of the sample. Finally, you might simply wish to have a convenient mapping for your feature vectors that would lose the ordering information, but nothing else.
instead of the original values. This is not always convenient. Firstly, the mapping of the original sample to the corresponding vector of order statistics (i.e. the sorting operation) is quite complicated to express mathematically. Secondly, the condition that the vector of order statistics is always sorted is not very pleasant to work with. A much better idea is to represent your data as a polynomial of the form
and
are equal if and only if their roots are equal, which means, in our case, that the samples
and
are equal up to a reordering.
different points (e.g. at
) - in any case we end up with the same amount of data as we had originally (i.e.
time to compute. Secondly, not every polynomial will have 
which takes a (relatively short) seed
and converts it into an element of
. In most cases, we as computer scientists are interested in functions
. However, there are other reasonable domains, too. For instance, when performing
, we would be interested in the functions of type
.
. Second, we can talk about the statistical properties of the output ![\Pr[\lim_n\frac{1}{N}\sum_{i=1}^N g(x_i)=\int_0^1g(x)dx] = 1](http://phd.kt.pri.ee/wp-content/cache/tex_a7d3523c75de0099fd197dd84cd995ef.png)
are taken independently and uniformly from the range
. However, computers are mostly made of deterministic parts and it turns out to be really difficult to automatically collect uniformly distributed bits. A design of a device that would solve this problem is far from trivial.
was explicitly specified through a big table. Of course, such a table is useful only if it can be used as a "reliable" source of random numbers. In particular, the value of
as possible. Since there are infinite number of functions, we cannot actually check this condition. Instead, statisticians performed a series of tests on the table to verify that the sequence
looks as close to random as possible. If we extend this concept properly, we get the modern quantification of pseudorandomness.
-secure pseudorandom generator if for any
-time algorithm
:![|\Pr [r\gets\mathcal{R}: A(r)=1]-\Pr[s\gets\mathcal{S}: A(f(s))=1]|\leq \varepsilon](http://phd.kt.pri.ee/wp-content/cache/tex_d8b282b291fe0d93c8ee8951662c280a.png)
. In more intuitive terms, if you replace the randomness
that runs in time
and outputs a solution that can be verified in time
. In this case, the use of a
-secure pseudorandom generator within
. Indeed, otherwise we could construct a
-time algorithm
that outputs 1, if
can be arbitrary real numbers. For the combinatorial search algorithm that takes 3 weeks CPU time, you might use a
-secure pseudorandom generator.
. For some tasks, such as Monte-Carlo integration of certain functions the corresponding solution is known (see multidimensional integration and sparse grids).
, rand() in C, or something more elaborate, depends on the application.









