Schedule and Syllabus
Welcome to course website for Sta-209, Applied Statistics! On this page you can find all materials we’ll use throughout the semester, starting with the syllabus linked below:
You can find course content by scrolling, or by using the navigation bar in the upper left. Please note:
- I will occasionally not post a lecture in advance if it contains some type of “surprise” that I don’t want you to see ahead of time.
- Links may not become active until we’ve reached the appropriate point in the course, so if you get a “404 page not found” error it is likely because we haven’t reached that topic yet.
Labs
Labs and due dates are listed below. Please note that during or after a lab you have the opportunity to report on the participation of your group members using this anonymous form.
Recommended Readings
The readings listed below are recommended prior to the date indicated. It has been my observation that students who complete these readings have been the most successful.
- Ch 1.1 (Wed 1/22)
- Ch 2.1, 2.2, 2.3 (Fri 1/24)
- Ch 2.4, 2.5 (Mon 1/27)
- Ch 2.6 (Wed 1/29)
- Ch 1.2 (Fri 1/31)
- Ch 1.3 (Wed 2/5)
- Ch 3.1 (Fri 2/7)
- Ch 3.2 (Mon 2/10)
- Ch 3.3, 3.4 (Wed 2/12)
- Ch 5.1 (only pages 372-275) and Ch 5.2 (Fri 2/14)
- Ch 6.1-CI and Ch 6.3-CI (Mon 2/17)
- Ch 6.2-CI and Ch 6.4-CI (Wed 2/19)
- Ch 4.1 and 4.2 (Mon 2/24)
- Ch 4.3 (Wed 2/26)
- Ch 4.4 and 4.5 (Wed 3/4)
- Ch 5.1 (Friday 3/6)
- Ch 6.1-HT and Ch 6.3-HT (Mon 3/9)
- Ch 6.2-HT, Ch 6.3-HT, and Ch 6.5 (Week of 3/31 - 4/3)
- Ch 7.1 and 7.2 (Week of 4/6 - 4/10)
- Ch 8.1 and 8.2 (Week of 4/13 - 4/17)
- Ch 9.1, 9.2, and 9.3 (Week of 4/20 - 4/24)
- Ch 10.1, 10.2, and 10.3 (Week of 4/27 - 5/1)
Homeworks
Homework is generally due at the start of class every Friday (with a few exceptions near exams). You can always turn-in an assignment early, but if you’d like to request an extension to turn-in an assignment late you must contact me at least 24-hours prior to the posted deadline and you submit the assignment electronically (a scanned or typed version sent via email).
- HW #1: 1.15, 1.16, 1.22, 2.20, 2.66, 2.116 2.117, 2.118
- Data needed for some exercises is available here
- due date: Friday 1/31
- HW #2: 1.40, 1.41, 1.42, 1.43, 1.44, 1.45, 1.60, 1.91, 1.92, 1.100, 1.103
- No explanations are necessary for 1.40 - 1.45
- due date: Friday 2/7
- HW #3: 3.25, 3.26, 3.29, 3.58, 3.64 3.93, 3.94, 3.102, 3.125, 3.130
- HW #4: 5.43, 5.52, 6.20, 6.26, 6.30, 6.96, 6.97, 6.98
- HW #5: 6.143, 6.148, 6.195, 6.196, 6.206
- HW #6: 4.16, 4.18, 4.65, 4.66, 4.68, 4.101, 4.102, 4.113, 4.114
- HW #7: 4.144, 4.145, 6.56, 6.57, 6.60, 6.163, 6.164, 6.165
- HW #8: 6.121, 6.122, 6.215, 6.216, 6.250, 6.256, additional questions #1 and #2
- due date: Friday 4/3 at 1:00pm Grinnell time
- Please submit your assignment as a single file in the open box on Pioneerweb
- HW #9: 7.13, 7.20, 7.26, 7.27, 7.43, 7.44, 7.46
- due date: Friday 4/10 at 1:00pm Grinnell time
- HW #10: 8.8, 8.15, 8.16, 8.17, 8.18, 8.32
- due date: Monday 4/20 at 1:00pm Grinnell time
- HW #11: 9.15, 10.23, 10.24, 10.26, 10.27, additional question #1
- due date: Friday 5/1 at 1:00pm Grinnell time
Announcements and Other Materials
- Please complete the survey at this link sometime during the first week of class.
- Exam #1 is tentatively planned for Friday 2/28
- Study materials will be posted on p-web at least 1 week prior to the exam
- Details regarding the Final Project can be found at this link
- The first deadline is Friday 3/6 at 11:59pm, when you need to choose who you’ll be working with
- At this link you may find the current list of groups
- The second deadline is Monday 3/30 at 11:59pm, when your group must submit a project proposal
- The third deadline is Monday 4/27 at 11:59pm, when your group is to submit your cleaned dataset and data dictionary
- Click here for information on distance learning
- Exam #2 will be made available at 8:00am Grinnell time on Friday 4/10 and must be submitted by 8:00am Grinnell time on Saturday 4/11
- Study materials are posted on p-web
- Solutions to the practice exam will be posted on Wednesday 4/8
Datasets
- Happy Planet
- This dataset was assembled by The Happy Planet Index using data from a global survey that asks respondents questions about how they feel their lives are going. It documents the health and well-being of the inhabitants of various nations around the world.
- Mass Shootings
- This dataset was originally assembled in response to the movie theater shooting in Aurora Colorado by Mother Jones, a liberal news organization. It documents shootings in the United States where a lone gunman (with a few exceptions involving two shooters) killed at least four individuals (not including themselves) at a single location (with a few exceptions involving multiple locations within a short period). Variables include: demographic characteristics of the shooter, information on when/where the shooting occurred, information on the number of victims, and information about the mental health status of the shooter.
- Death Penalty Sentencing
- This dataset comes from a widely cited study on racially biased sentencing in the Florida court system during the 1970s. Researchers collected data on all murders that took place during a felony committed in the state of Florida between 1972 and 1977. They record the race of the victim and the offender, as well as whether the offered was sentenced to the death penalty.
- Infant Heart Surgery
- This dataset contains the results of a randomized experiment conducted by surgeons at Harvard Medical School to compare a “low-flow bypass” and “circulatory arrest” surgical approaches in the treatment of infants born with congenital heart defects. The outcomes recorded are Psychomotor Development Index (PDI), a composite score measuring physiological development, with higher scores indicating greater development, and Mental Development Index (MDI), a composite score measuring mental development, with higher scores indicating greater development.
- San Francisco Mall Shoppers
- In 1987, Impact Resources Inc. surveyed 9409 shopping mall customers in the San Francisco Bay area (San Francisco, Oakland, and San Jose). This dataset contains the responses of 6876 shoppers who completed the survey. It includes demographic characteristics such as: age, sex, income, marital status, education, occupation, household size, home type, and ethnicity.
- Lead IQ
- CDC researchers collected data in El Paso Texas from samples of children aged 3-15 living near (within 1 mile) and far (more than 1 mile away) from a local lead smelter. This dataset documents the dependent variable, age-adjusted IQ score, for a subset of those children. These data were obtained from the textbook: Fundamentals of Biostatistics by B. Rosner.
- Wetsuits
- This dataset contains 1500m swim velocities for 12 competitive swimmers when wearing a specially designed wetsuit and without the wetsuit. These data were obtained from the Lock5stat data page.
- Professor Salaries
- This dataset contains 9-month academic year salaries for faculty members at a major US public university. In addition to salaries from the 2008-09 academic year, the dataset contains de-identified documentation of the sex, rank, discipline, and experience of each faculty member. These data were obtained from the carData
R
package.
- Police Killings
- This data originates from the FiveThirtyEight article Where Police Have Killed Americans in 2015. It contains demographic and geographic information on everyone killed by the police in the year 2015, including the person’s name, age, race, gender, cause of death, whether the person was armed. It was merged to include the poverty, unemployment, and college education rates of the census tract where the killing took place.
- Golden State Warriors
- This dataset documents each of the 82 games in the record setting 2015-16 Golden State Warriors season. It was obtained from Basketball Reference.
- TSA Claims
- The Transport Security Administration (TSA) is an agency within the US Department of Homeland Security that has authority of the safety and security of travel in the United States. This dataset documents claims made by travelers against the TSA between 2003 and 2008, including information of the claim type, claim amount, and whether it was approved, settled, or denied.
- Tailgating and Drug Use
- Participants were recruited based upon their self-reported drug use and asked to follow an erratically behaving lead vehicle as part of a driving simulator experiment under the working hypothesis that regular users of certain drugs would exhibit more risky driving behavior. The data were derived from a study conducted by the National Advanced Driving Simulator organization.
- Colleges
- These data come “The College Scorecard”, a government run database containing information on all degree-granting higher education institutions. This dataset are a subset of the Scorecard’s 2018 dataset, they include only small colleges (1000-5000 students) that primarily award bachelor’s degrees and require the ACT for admission.
- Tips
- These data were recorded by a waiter in national chain restaurant located in a suburban shopping mall in the early 1990s. The data document various aspects of each table served by the waiter, including the total bill, tip, size of the party, time of day, day of the week, and whether the party included a smoker. The data were originally obtained from the textbook: Interactive and Dynamic Graphics for Data Analysis: With R
- Halo Effect
- The “halo effect” is theory suggesting that postive ratings in one aspect of a person, brand, or object lead to favorable views of other aspects. This dataset is based upon results from a 1974 experiment described in the article “Beauty is Talent: Task Evaluationas a Function of the Performer’s Physical Attraction” published in the Journal of Personality and Social Psychology. In the study 60 male undergraduate readers were asked to score an essay said to be written by a female undergraduate. 20 readers were given a photo of an attractive female said to be essay’s author, 20 readers received a photo of an unattractive female said to be author, and 20 readers received no photo.
- Commute Tracker
- This dataset documents the daily commutes of a worker residing in the greater Toronto region of Canada. It was collected via a GPS tracking app, and documents several variables related to time, distance, speed, delays, and date. I came across this dataset at this ML workshop github page, I am unsure of its original origin.
- Iowa City Home Sales
- This dataset contains information on homes sold in Iowa City, IA between 2005 and 2008. It was scraped from the Johnson County county assessor website, and it contains information such as the home’s sale price, assessed value, square footage, and features.
- Breast Cancer Survival
- A brief description can be found in Lab 10