Descriptive Statistics

The first instinct of the scientist should be to organize carefully a question of interest, and to collect some data about this question. How to collect good data is a real and important issue, but one we will discuss later. Let us instead assume for the moment that we have some data, good or bad, and first consider what to do with them.[1] In particular, we want to describe them, both graphically and with numbers that summarize some of their features.

We will start by making some basic definitions of terminology — words like individual, population, variable, mean, median, etc. — which it will be important for the student to understand carefully and completely. So let’s briefly discuss what a definition is, in mathematics.

Mathematical definitions should be perfectly precise because they do not describe something which is observed out there in the world, where such descriptive definitions might have fuzzy edges. In biology, for example, whether a virus is considered “alive” could be subject to some debate: viruses have some of the characteristics of
life, but not others. This makes a mathematician nervous.

When we look at math, however, we should always know exactly which objects satisfy some definition and which do not. For example, an even number is a whole number which is two times some other whole number. We can always tell whether some number [latex]n[/latex] is even, then, by simply checking if there is some other number $k$ for which the arithmetic statement $n=2k$ is true: if so, $n$ is even, if not, $n$ is not even. If you claim a number $n$ is even, you need just state what is the corresponding $k$; if you claim it is not even, you have to somehow give a convincing, detailed explanation (dare we call it a “proof”) that such a $k$ simply does not exist.

So it is important to learn mathematical definitions carefully, to know what the criteria are for a definition, to know examples that satisfy some definition and other examples which do not.

Note, finally, that in statistics, since we are using mathematics in the real world, there will be some terms (like individual and population) which will not be exclusively in the mathematical realm and will therefore have less perfectly mathematical definitions. Nevertheless, students should try to be as clear and precise as possible.

The material in this Part is naturally broken into two cases, depending upon whether we measure a single thing about the individuals in some population of interest or we make several measurements. The first case is called one-variable statistics, and will be our first major topic. The second case could potentially go as far as multi-variable statistics, but we will mostly talk about situations where we make two measurements, our second major topic. In this case of bivariate statistics, we will not only describe each variable separately (both graphically and numerically), but we will also describe their relationship, graphically and numerically as well.

  1. The word "data" is really a plural, corresponding to the singular "datum." We will try to remember to use plural forms when we talk about "data," but there will be no penalty for the (purely grammatical) failure to do so.


Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Lies, Damned Lies, or Statistics, v2 by Jonathan A. Poritz is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book