This class will be accepted in place of QTM 220 as a prerequisite and there is substantial overlap in content between the two. However, there will be a greater emphasis on the precision with which we move from stories and intuition to formal mathematical reasoning (and back) in this course. Data visualization, including sketching by hand and plotting in R, will be emphasized as a tool for making this connection. Being precise about how and why our methods work makes it easier to adapt them to answer new questions and work with new types of data. We'll practice this important skill often, e.g. by looking into how to make predictions that tell us about what happens in 90% of cases rather than in the typical case (quantile regression) and how to compare different treatments for a terminal illness using time-to-event data (survival analysis).
I will hold office hours on Mondays from 2:00-4:00 in PAIS 583 excepting university holidays.
You'll need some basic R programming skills, too. You'll be reading and reusing code rather than writing it from scratch, so if you're not familiar with R but have experience programming in another language, you should be fine. Like with the math skills, we'll review as we go.
Formally, the prerequisites are the same as for QTM 220: either QTM 210, Econ 220, or Math 361 and 362 for probability; Math 210 or 211 for calculus; Math 221 for linear algebra; and QTM 150 for R programming. Or equivalents. If you're missing some of these but willing to work as we go to fill in the gaps, I'm happy to have you in class and will do my best to help you succeed. If you're interested but not sure if you're prepared, reach out and we'll talk.
The core content and organization of this class and QTM 220 are similar, but this one will be a bit more mathematically rigorous and a bit broader in scope. In particular, in addition to using the homework to get some practice working with the methods we're covering in lecture, we'll use it to explore some variations on them. The intention is to develop skills and confidence that'll help you tackle new problems by adapting what you already know. As a result, this class will probably be a bit more work than QTM 220 at times.
If you're planning to learn more about machine learning or causal inference methods or work in an area that uses them, you'll probably find that there are some gaps between what you'll need there and what you'd learn in QTM 220. Or in this class. Gaps are inevitable. What's intended is that, with this one, you'll have a bit of a head start. You'll have the kind of understanding and practice that makes it easy to identify the gaps and to work through them. The time you'd save then might be worth the extra time you'd put in now taking this class. It might be more fun, too.
Week 0 | |
W Aug 28 | Introduction |
F Aug 30 | Probability Background: Sampling |
Homework | |
Week 1 | |
M Sep 2 | No Class. Labor Day. |
W Sep 4 | Point and Interval Estimates |
F Sep 6 | Probability Background: Random Variables |
Homework | |
Week 2 | |
M Sep 9 | Calibrating Interval Estimates with Binary Observations |
W Sep 11 | Calibrating Interval Estimates with Binary Observations (continued) |
F Sep 13 | Probability Background: Expectations |
Homework | |
Week 3 | |
M Sep 16 | Calibrating Interval Estimates using the Bootstrap |
W Sep 18 | Normal Approximation and Sample Size Calculation |
F Sep 20 | Conditional Expectations |
Homework | |
Week 4 | |
M Sep 23 | Comparing Two Groups |
W Sep 25 | Comparing Two Groups (continued) |
F Sep 27 | Review |
Practice Exam | |
Week 5 | |
M Sep 30 | Practice Exam Review |
W Oct 2 | Midterm 1 |
F Oct 4 | The Midterm 1 Solution |
Homework | |
Week 6 | |
M Oct 7 | Summarizing Trends involving Many Groups |
W Oct 9 | Inference for Complicated Summaries |
F Oct 11 | A Game of Telephone |
No Homework. Enjoy Your Fall Break. | |
Week 7 | |
M Oct 14 | No Class. Fall Break. |
W Oct 16 | Multivariate Analysis and Adjustment |
F Oct 18 | Multivariate Analysis and Adjustment (continued) |
Homework | |
Week 8 | |
M Oct 21 | Inference for Complicated Summaries (continued) |
W Oct 23 | Potential Outcomes and Causality |
F Oct 25 | Potential Outcomes and Causality (continued) |
Week 9 | |
M Oct 28 | Cancelled |
W Oct 30 | Least Squares Regression in Linear Models |
F Nov 1 | Least Squares in R |
Homework | |
Week 10 | |
M Nov 4 | Cancelled. |
W Nov 6 | No Class. There was an Election Yesterday. |
F Nov 8 | No Class. There was an Election 3 Days Ago. |
Week 11 | |
M Nov 11 | The Behavior of Least Squares Predictors (revised) (from class) |
W Nov 13 | Misspecification and Inference |
F Nov 15 | Misspecification and Averaging |
Week 12 | |
M Nov 18 | Inverse Probability Weighting |
W Nov 20 | Application: Profit vs. Outcomes in Heart Attack Patients |
F Nov 22 | Discussion of Profit vs. Outcomes Papers |
Week 13 | |
M Nov 25 | Case Study |
W Nov 27 | No Class. Almost Thanksgiving Recess. |
F Nov 29 | No Class. Thanksgiving Recess. |
Week 14 | |
M Dec 2 | TBD |
W Dec 4 | Trees |
F Dec 6 | Image Denoising |
Take-Home Midterm. Posted Monday at end of class; Due Wednesday at 11:59 PM. | |
Extra Office Hours Tuesday and Wednesday from 2-3:45 | |
Week 15 | |
M Dec 9 | Wrap Up and Review |
Readings, assignments, announcements, and course information are available through our course Canvas site. When I need to communicate with everyone in the course, for example, to amend an assignment or reschedule my office hours, I will make an announcement on Canvas. Your notifications/settings can be adjusted however you prefer to receive these announcements. To contact me, please use email or come to office hours. Please do not message me via Canvas inbox or reply to comments on assignments on Canvas or Gradescope, as it's likely that I won't see it for a while. If feedback you receive via Canvas or Gradescope is unclear or you would like to discuss comments, please come to office hours or make an appointment to meet with me.
I prefer to speak with students in real time rather than via email. That helps us get to know each other better and tends to lead to more efficient communication. The office hours listed above are set aside entirely for you; you don't need to make an appointment and can come and go as you please. I prefer that you attend in person, but if you can't make it to campus, I'm happy to talk via zoom. Join my meeting; I'll be there. If you'd like to meet outside of these hours, please email me to set up an appointment. I'll do my best to respond to emails within 48 hours.
I expect you to attend the majority of our class meetings. Even Fridays. That said, schedule conflicts and illness happen. For my sake and your classmates', please do not come to class or office hours sick. I will record the lectures and post recordings soon after class. There is no need to explain your absences, but please try to inform me of them in advance of class meetings.
Please bring a laptop with an updated version of R to every meeting, as you'll need it for some in-class activities and quizzes. It may be worth bringing a tablet, too, as the lecture slides on this website embed a little whiteboard app. You can take notes, do calculations, and draw sketches right on top of them. That's what I'm doing when I'm presenting the slides.
Homework will be assigned weekly except for the weeks of and the weeks before our midterm exams. These will be a mix of calculations by hand and computer, often accompanied by sketches or plots to illustrate what's going on visually; structured data analysis tasks; and communication tasks that address the methods and results of some data analysis (your own or somebody else's) using a mix of sketching and writing. Collaboration on homework is encouraged. I prefer that each student write and turn in solutions in their own words, and think that it is often best that this writing is done separately, with collaboration limited to discussion of problems, sketching solutions on a whiteboard, etc. This will help you and I understand where you're at in terms of your proficiency with the material.
Each homework assignment, as well as a solution to the previous week's, will be posted Thursday at midnight and due the following Thursday at 11:59. Assignments will be posted on and submitted via Canvas. So that I can post solutions and provide feedback without delay, late work will not be accepted.
The use of Large Language Models, e.g. GPT4, to assist you in writing them is encouraged. I use one to help me write almost everything, including the materials for this class. In particular, I use the GitHub Copilot extension for VSCode, and I'm happy to help you get that set up on your computer if you'd like. But I'll warn you that doing this well will involve a lot of editing. The perspective we take in this class is very different from the ones you'll find in most of the text these models were trained on. As a result, they tend to respond to my questions with what is, when you work through all the qualifications and jargon, a nonanswer. I expect real answers, in the terminology we use in class, that you're prepared to explain.
Please submit your work as a single PDF or HTML file with answers to each question in order and clearly labeled. And please try to keep your submissions concise. In particular, include code only if it's asked for explicitly and plots only if asked for explicitly or you're using them to illustrate a point you're making in your answer. In that case, your text should refer to the plot explicitly and explain where to look and what we're meant to see. In short, write your answers knowing that somebody is going to read them and would prefer not to work harder than necessary to understand what you're saying. I'm not asking for beautiful formatting. It's fine, for example, to write answers by hand, take photos, and stick them in a PDF as long as everything is legible, labeled, and in the right order.
Final grades will be a weighted average of scores on Quizzes (5%), Homework (25%), Two Midterms (20% each), and a Final Exam (30%).