- Definition 1: A statistic is a
quantity computed from a sample that can be used for a statistical
purpose (inference if you’ve heard the term, otherwise you can
loosely view inference as decision making).
- Definition 2: A descriptive
statistic is a statistic that summarizes or conveys information
about some aspect of a sample of data
Popular types of information to convey using descriptive statistics
are:
- Central tendency - mean or median value
- Variability - standard deviation, range, interquartile
range
- Frequencies or relative frequencies - counts in a category
or range, proportions, rates
- Relationships - correlation coefficient, difference in
means or proportions, odds ratio/relative risk
The goal of any descriptive statistic is to accurately
convey important information without needing to showcase the
entire set of data. For example, a correlation coefficient can provide a
good summary of the relationship in a scatter plot (though not in all
cases!)
\(~\)
Practice
Descriptive statistics are an essential tool when working data;
however, there are often many different descriptive statistics that
could be used to convey information. It is important to understand the
types of choices that can be made and their implications.
Scenario 1: Suppose you’d like to describe how
financially well off the residents of a particular neighborhood are.
Additionally, suppose you have the power to ascertain any
information about each resident that you think might be useful. What you
share to most accurately convey this aspect of the neighborhood?
- Question: Each of the following are descriptive
statistics one might consider in this scenario. Identify one or more
limitations of each (assuming your goal is to provide accurate
information), then choose the 1 or 2 descriptive statistics that you
feel are most effective.
- The aggregate income, which is the sum of the
annual incomes of all residents of the neighborhood
- The average individual income, which is the
aggregate income divided by the number of individual
residents
- The average household income, which is the
aggregate income divided by the number of households (which
each may contain one or more residents)
- The total wealth, which is the sum of the net worth
(assets - debts) for every resident
- The average individual wealth, which is the total
wealth divided by the number of individual residents
- The average household wealth, which is the total
wealth divided by the number of households
\(~\)
- Remarks: Question 1 can be broken down into 3
different choices:
- choice of the variable/attribute to focus on
- use of raw/crude numbers vs. rates/scaled numbers
- choice of denominator when opting for a rate or scaled number
\(~\)
Scenario 2: Suppose a new disease has emerged in
your community and you are trying to decide how worried you should be.
Further, suppose you have the ability ascertain any descriptive
statistic related to the impacts of this disease without any barriers or
challenges; however, you may only rely upon one descriptive
statistic.
- Question:
- Part A: What variable/attribute would you focus on?
Some examples might include infections, or deaths, or hospitalizations,
etc.
- Part B: Would you use a raw/crude number or a
rate/scaled number?
- Part C: What descriptive statistic would you
ultimately rely upon?
\(~\)
- Remarks: In reality you aren’t limited in how many
descriptive statistics you can share in the context of a particular
argument. However, more is not better.
- The Gish
gallop is an argumentation strategy that attempts to overwhelm
someone by providing an an excessive amount of information with no
regard for quality or accuracy
- Brandolini’s
law is the idea that the amount of effort required to refute a
faulty claim is orders of magnitude greater than the effort needed to
make the faulty claim.
- An awareness of these ideas can help shape the way you approach
argumentation/debate.