# Statistics

Along with probability, upon which it is mostly based, **statistics** is a mathematical discipline which provides techniques for drawing conclusions from observed data. It is heavily relied upon in most scientific fields (experiments and observational studies), as well as in business and industry (marketing, operations management, economic analysis, quality control) and government (public polling, policy analysis).

Statistics can be divided into two large subfields, *descriptive statistics* and *inferential statistics*:

- Descriptive statistics
- Numerical and graphical summaries of data.
- Inferential statistics
- Probability-based analysis used to infer something about a larger, mostly unobserved,
*population*based on what you see in a*sample*from that population.

A central idea in inferential statistics that is widely misunderstood is statistical significance. In an experiment to test, say, whether a new drug to treat a disease is more effective than the old, standard treatment, the degree of improvement the new drug provides is said to be *statistically significant* if it is so large as to be unlikely to have occurred by chance alone.

In other words, if one sees a large (or, indeed, *any*) improvement using the new drug (over what is expected or observed using the standard treatment), there are two possible explanations:

- There is actually no overall benefit to the new drug, and the sample results simply occurred "by chance" because of individual differences in how people respond to medical treatments.
- There
*is*a benefit to the new drug over the old, and the sample results are simply reflecting this fact.

The larger the amount of improvement actually observed in the sample (or the larger the sample sizes used), the less convincing explanation #1 becomes, and the more convincing explanation #2 becomes. Statistics gives a way of calculating the *probability* that explanation #1 could be true (it is technically a conditional probability, *assuming* there is no overall benefit, and is called the *p-value*). The smaller that probability is, the more likely explanation #2 is the correct one.