Guillaume Calmettes
Importance of visualizing your data
Know/Describe your dataset
How to present categorical and quantitative variables
Shape
Central tendency
Spread
A key step in the statistical investigation method is drawing conclusions beyond the observed data. Statisticians often call this statistical inference.
They are 4 main types of conclusions (inferences) that
we can draw from data:
Could the observed result be considered statistically significant?
Is our result unlikely to happen by random chance?
Dr. Jarvins Bastian's experiment (1964)
In one phase of the study, Dr. Bastian had Buzz attempt to push the correct button
a total of 16 different times.
=> Buzz pushed the correct button
15 out of 16 times. ($\overset{\hat{}}{p} = \frac{15}{16} = 0.938$)
Think about it:
Based on these data, do you think Buzz somehow knew which button to push?
Is 15 out of 16 correct pushes convincing to you?
Or do you think that Buzz could have just been guessing?
Buzz is just guessing (his probability of choosing the correct button is 0.50) and he got really lucky in these 16 attempts. $\pi=0.5$
Buzz is doing something other than just guessing (his probability of choosing the correct button is more than 0.50). $\pi>0.5$
The key question here is to determine what results would occur in the long-run under the assumption that Buzz is just guessing.
=>
We call this assumption of random guess by Buzz the null hypothesis (or null model)
How could we decide between our 2 hypotheses?
Statisticians often employ chance models to generate data from random processes to help them investigate such processes.
What probability do we need to simulate to test our hypothesis?
=> 50/50 chance model
What would be a good simulation model for Buzz & Doris communication experiment?
=> coin flip
Model | Experiment |
---|---|
Coin flip | Button choice by Buzz |
Heads | Correct button |
Tails | Wrong button |
Chance of heads | Probability of Buzz pressing the correct button |
One set of 16 coin flips |
One set of 16 experiments |
If Buzz was just guessing which button to push each time, what would be the number of correct choices we would observe for 16 attempts?
https://goo.gl/cYUbFn
1- Flip a coin 16 times
2- Record the number of heads that you obtain
3- Enter this value in the spreedsheet in the cell assigned to your name (please only fill up the value for your name!)
What are the:
observational units?
variables?
Hint: what does each data point represent?
Each dot represents the
number of heads
in
one set of 16 coin tosses
Does it seem that the number of correct button choices by Buzz would have been surprising if in fact he was just guessing?
We really need to simulate this random selection process hundreds, preferably thousands of times to obtain the long-run pattern of our simulation.
This would be very tedious and time-consuming with coins to obtain the long-run distribution of the number of heads for 16 coin-flips, so we'll turn to technology to simulate it.
How many of these 1000 simulations produced 15 or more correct choices by Buzz?
1 out of 1000
What is the corresponding proportion of simulations that produced such an extreme result?
$\frac{1}{1000}=0.001$
A p-value is the probability of obtaining a result as extreme as the one observed, assuming that the null hypothesis is true.
A small p-value casts doubt on the null hypothesis/model used to perform the calculation (in this case, that Buzz was just guessing which button to push).
A p-value
$\leq{0.10}$
$\leq{0.05}$
$\leq{0.01}$
$\leq{0.001}$
is generally considered
to be
some
fairly strong
very strong
extremely strong
evidence against the null
The results ($\textrm{p-value}=\frac{1}{1000}=0.001$) mean our evidence is strong enough to be considered statistically significant. That is, we don't think our study result (15 out of 16 correct) happened by chance alone, but rather, something other than "random chance" was at play.
One goal of statistical significance testing is to rule out random chance as plausible (believable) explanation for what we have observed. We still need to worry about how well the study was conducted.
- Could Buzz see the light through the curtain?
- Could Buzz have detected a pattern in the succession of light signal?
- We haven't completely rule out random chance (but probability is very small)
One option that Dr. Bastian pursued was to redo the study except now he replaced the curtain with a wooden barrier between the two sides of the tank in order to ensure more complete separation between Doris and Buzz.
=> In this case, Buzz pushed the correct button only 16 out of 28 times.
Buzz' successes: 16 out of 28 ($\overset{\hat{}}{p} = \frac{16}{28} = 0.57$)
Simulations:
This time we need to do repetitions of
28 coin flips, not just 16.
Buzz' successes: 16 out of 28 ($\overset{\hat{}}{p} = \frac{16}{28} = 0.57$)
Simulations:
This time we need to do repetitions of
28 coin flips, not just 16.
p-value:
$\frac{2795}{10000}=0.280$
Not enough evidence that the "by-chance-alone" model
is wrong.
In fact, Dr. Bastian soon discovered that in this set of attempts the equipment malfunctioned and the food dispenser for Doris did not operate and so Doris was not receiving her fish rewards during the study.
=>
It is not so surprising that removing the incentive hindered the communication between the dolphins and we cannot refute that Buzz was not guessing.
Dr. Bastian fixed the equipment and ran the study again. This time he found convincing evidence that Buzz was not guessing.
Conclusion/Generalization:
Dolphins can communicate abstract concepts!
1- Collect your sample and calculate your statistic of interest (ex: $\overset{\hat{}}{p}=\frac{17}{24}=0.708$)
2- State your null and alternative hypothesis
(ex: H$_0$ $\pi=0.5$ / H$_a$ $\pi>0.5$)
3- Simulate your null hypothesis distribution. (it should be centered at the stated H$_0$ parameter of interest)
4- Calculate the proportion of samples that resulted in cases at least as extreme
as your initial observed statistic. This is your p-value.
(ex: $\textrm{p-value}=\frac{289}{10000}=0.029$)
5- Conclude about whether your have evidence in favor or against the null hypothesis.
Another way (than p-value) to measure strength of evidence is to
standardize the observed statistic by
measuring how far it is from the mean of the
null distribution using standard deviation units.
This measure is commonly noted $z$.
$z=\frac{\textrm{observed statistic-}\fragindex{2}{\fraglight{highlight-blue-gc}{\textrm{mean of null distribution}}}}{\fragindex{4}{\fraglight{highlight-red-gc}{\textrm{standard deviation of null distribution}}}}$
Mean of null distribution:
$\mu = 0.5$ ($\pi$)
SD of null distribution:
???
In the early 1900s (and even earlier), computers weren't available to do simulations, and as peole didn't want to sit around and flip coins all day long, they focused their attention on mathematical and probabilistic rules and theories that could predict what would happen if someone did simulate.
They proved the following result:
Central limit theorem:
If the sample size (n) is large enough, the distribution of sample
proportion will be bell-shaped (or normal), centered at the long-run
proportion, with a standard deviation of
$\sqrt{\frac{\pi(1-\pi)}{n}}$
Validity conditions:
The normal approximation can be thought of as a prediction of what would occur if
simulation was done. Many times this prediction is valid, but
not always. The prediction is considered valid when there
are at least 10 successes and 10 failures in the sample.
Applied to the coin flip model, this correspond to:
In the second experiment, Buzz got 16 correct choices out of 28 attempts.
Buzz got 16 correct choices out of 28 attempts
$z=0.756$the number of correct choices is 0.756 SD away from the mean of the null normal distribution
Note:
Mathematically, it is possible to calculate the exact probability of getting $\geq16$ heads in 28 tosses using the binomial distribution.
p = 0.2858
Note:
Mathematically, it is possible to calculate the exact probability of getting $\geq16$ heads in 28 tosses using the binomial distribution.
p = 0.2858
Which method gives the p-value closest to the true probability? Why?
Lots of assumptions & required validity conditions!
Resampling simulation | Normal approximation (Z-test) |
---|---|
Collect your sample and calculate your statistic of interest ($\overset{\hat{}}{p}$) | |
State your null (H$_0$) and alternative (H$_a$) hypotheses | |
Directly simulate your null hypothesis distribution (this results in a resampling distribution centered at the stated H$_0$ proportion $\pi$) |
Consider the normal distribution centered at the long-run
H$_0$ proportion $\pi$, with a standard deviation of
$\sqrt{\frac{\pi(1-\pi)}{n}}$
|
Calculate the proportion of resampling samples that resulted in cases at least as extreme
as your initial observed statistic. This is your p-value |
Determine the proportion of area under the curve that is at least as extreme
as your initial observed statistic. This is your p-value |
Conclude about whether your have evidence in favor or against the null hypothesis |