Instructor:
Class Meetings:
Office Hours:
Class Mentors:
Course Description:
Machine learning is a branch of artificial intelligence (AI) rooted in computational statistics that focuses on the development of models and algorithms capable of identifying patterns in data. In this course, students will implement machine learning methods in Python to solve problems from a variety of disciplines. Topics include model validation and optimization, decision trees, boosting and bagging, neural networks, transfer learning, and recently developed methods. Students will complete a semester-long capstone project. Prerequisite: MAT-215 and STA-230.
Texts:
There is no required textbook for this course. All required materials will be posted on our course website.
Most of our course materials are based upon information in the follow textbooks:
\(~\)
This course aims to develop conceptual, theoretical, and applied perspectives on commonly used machine learning algorithms.
After completing this course, students should be able to:
\(~\)
Class Sessions
Class time will be split between “lecture” and “lab”. Most class meetings will begin with a short, low-stakes quiz (2-3 conceptual questions) to provide an incentive for consistent attendance and review of course content outside of class. Additional details on these quizzes can be found in the “Grading” section of the syllabus.
Portions of the class devoted to “lecture” will focus on conceptual and mathematical topics with minimal discussion of coding/software.
Portions of class devoted to “lab” involve working in assigned and/or self-formed groups of 2-3 on applications of course topics using Python.
Attendance
Lab sessions will involve collaboration with others in assigned and self-selected groups. This means that attendance during labs is especially important. While I understand that missing class is sometimes necessary, if you will be absent for any reason I ask to be notified as soon as possible so that assigned lab groups can be modified. Showing up late or missing class more than once without prior notice will negatively impact the participation component of your course grade.
Software
Software is an essential component of machine learning, and this class will make extensive use of Python (any version 3 release should be fine). You are welcome to use any Python IDE that you are familiar with, but I encourage you to use Jupyter to record your work on labs/assignments.
You are welcome to use your own personal laptop, or a classroom computer during the course. If you are working your own laptop, I suggest downloading the most recent Anaconda Distribution to ensure compatibility with the code examples that will be given during class. Jupyter Notebook and Jupyter Lab can both be found in the Anaconda Navigator, and Anaconda comes with most of the libraries we’ll be using pre-installed.
There are popular cloud-based platforms capable of running Python code that you might opt to use; however, these platforms are not sanctioned by Grinnell College.
Academic Honesty
At Grinnell College you are part of a conversation among scholars, professors, and students, one that helps sustain both the intellectual community here and the larger world of thinkers, researchers, and writers. The tests you take, the research you do, the writing you submit-all these are ways you participate in this conversation.
The College presumes that your work for any course is your own contribution to that scholarly conversation, and it expects you to take responsibility for that contribution. That is, you should strive to present ideas and data fairly and accurately, indicate what is your own work, and acknowledge what you have derived from others. This care permits other members of the community to trace the evolution of ideas and check claims for accuracy.
Failure to live up to this expectation constitutes academic dishonesty. Academic dishonesty is misrepresenting someone else’s intellectual effort as your own. Within the context of a course, it also can include misrepresenting your own work as produced for that class when in fact it was produced for some other purpose. A complete list of dishonest behaviors, as defined by Grinnell College, can be found here.
Inclusive Classroom
Grinnell College makes reasonable accommodations for students with documented disabilities. To receive accommodations, students must provide documentation to the Coordinator for Disability Resources, information can be found here. If you plan on using accommodations in this course, you should speak with me as early as possible in the semester so that we can discuss ways to ensure your full participation in the course.
Religious Holidays
Grinnell College encourages students who plan to observe holy days that coincide with class meetings or assignment due dates to consult with your instructor in the first three weeks of classes so that you may reach a mutual understanding of how you can meet the terms of your religious observance, and the requirements of the course.
Getting Help
In addition to visiting office hours and completing the recommended readings, there are many other ways in which you can find help on assignments and projects.
The Data Science and Social Inquiry Lab (DASIL) is staffed by mentors who are experienced programmers and may be able to troubleshoot coding problems you are having. Many students who’ve successfully completed this course have made extensive use of the DASIL work space and its computing resources.
The online platform Stack Overflow is a useful resource for finding user-generated coding solutions to coding questions. Nearly all professional data scientists have needed to “look up” a coding strategy on a site like Stack Overflow at some point in their career, and I have no problem with you doing the same on assignments or projects. However, if you make substantial use of a Stack Overflow answer (ie: actually integrating lines of code written by someone else into your work, not just getting help identifying the right functions/arguments) the expectation is that you cite or acknowledge doing so.
Large Language Models and AI
Grinnell’s college-wide Academic Honesty policy requires that use of generative AI be appropriately cited or acknowledged as any other source would be. In this course you are fully permitted to use generative AI for assistance on in-class work or homework assignments, so long as you properly acknowledge your use and work within the statistical and coding frameworks described in our lectures and labs. Using generative AI to produce solutions that are inconsistent with the approaches and methods discussed in our lectures and labs will result in low scores on assignments. Some particular cases where generative AI can be helpful in this course include: checking your written work for errors or typos, understanding coding error messages, and explaining example code in an a more interactive manner. If you decide to use generative AI to assist with in-class work or homework it is essential for you to recognize that you will not have access to these tools on in-class exams, which comprise the majority of your end-of-semester grade. Thus, it is critical that you use AI as a tool, not as a replacement for your own thinking and understanding of course topics.
\(~\)
Engagement, Labs, and Personal Growth - 15%
In-class labs contain embedded questions that you and your lab partner(s) should answer together in a single document. A few select lab questions will be scored for accuracy with feedback given, while most will be scored for effort/completion. Your submitted lab work will contribute to roughly half of this grade category.
By the end of the semester you are to submit a 1-2 page growth statement expressing how you’ve deepened your understanding of machine learning and furthered your academic or career goals throughout the semester. There is no required structure to this statement, and it will be used in conjunction with my own observations throughout the semester to form an engagement score comprising the remaining half of this grade category. To facilitate this you should have specific goals for yourself in the course and keep track of specific events that you believe to support your progress towards these goals.
Daily engagement in a lab-heavy course is absolutely critical. During labs you are expected to help your partner(s) learn the material (which goes beyond simply answering the lab questions), and your partner is expected to help further your understanding. Everyone will begin the semester with a baseline engagement score of 80, which will move up or down depending on my subjective assessment of your behavior during class. You can very quickly raise this score by helping your lab partner(s), and working diligently to understand course material during class. Alternatively, you can lower this score by skipping class, letting your lab partner(s) do most of the work, using your phone or surfing the web during class, etc. Reports from lab or project partners that you are not contributing equally to group efforts may also influence this score.
Homework - 20%
There will be 5-6 homework assignments throughout the semester. These will contain a mixture of mathematical/theoretical, applied, and written/conceptual questions.
In-class Quizzes - 15%
Short in-class quizes will be delivered during the first 5-minutes of most class meetings. These quizzes will contain 1-3 brief questions, often multiple choice, covering concepts from previous lectures. These quizzes are intended to cover essential concepts that any machine learning specialist should have committed memory. Quizzes cannot be retaken or made up, but your lowest two quiz scores will be dropped at the end of the semester. Pending instructor approval and coordination with academic support staff, special exceptions may be made for circumstances involving prolonged absences.
Midterm Exam - 20%
There will be an in-class exam roughly two-thirds of the way into the semester (sometime in November). The precise date/time will be announced no later than 2-weeks in advance of the exam. This exam focuses on conceptual and mathematical topics from the course, thus it is intended to be complementary to the applied focus of the capstone project (see below). Details and review materials will be provided later in the semester.
Capstone Project - 30% (Presentation: 7%, Report: 18%, Code: 5%)
For this project you will work in a group of 2-4 students on a self-selected machine learning problem involving a non-trivial data set of your choosing. Your group will be responsible for creating a repository containing the code and data used during the project. You will present your results in a 10-minute in-class presentation intended to mirror a scientific conference presentation (ie: assume a modest amount of machine learning knowledge from the audience, with limited content area knowledge in the domain of your application). You will also prepare a 3-page (single-spaced, not including figures) scientific report summarizing your methods and results.
\(~\)
Below is a list of course topics and tentative time frames for covering them.