Segev BenZvi

Department of Physics and Astronomy, University of Rochester

PHY 403: Modern Statistics and Exploration of Large Datasets

Spring 2015
Bausch and Lomb 315: MW 9:30 - 10:50

PHY 403 is a graduate-level lecture course on probability and statistics. The class also includes a strong data analysis and numerical methods component centered around bi-weekly homework sets.

Location and Office Hours

Instructor TA
Segev BenZvi Brian Coopersmith
B&L 405 B&L 373
Tu 11-12, Th 2-3 W 12:45-1:45

Textbooks

There are two concise (and inexpensive) textbooks used in this course:

Data Analysis: A Bayesian Tutorial Statistial Data Analysis
D.S. Sivia and John Skilling Glen Cowan
ISBN-10: 0198568320 ISBN-10: 0198501552

The following books are not required but are great references. You may find them on reserve at POA or available as an online electronic reference on the River Campus Library website:

Syllabus

Grading

Homework45%
Class Participation10%
Midterm15%
Final Project30%

Homeworks are assigned bi-weekly and will have a significant programming component. Assignments are due Friday at 5 pm two weeks after it is assigned. You can use any programming language you like (including Mathematica and R), but support from the TA and instructor is limited to Python and ROOT.

You may discuss the problems informally with your classmates but you must complete the homework on your own. Printouts of source code and plots are required to receive full credit.

The final project can be on a data analysis project of your choice, either reflecting your current work or your analysis of a previous result. You will present your results during a 20 minute presentation at the end of the semester (April 20-29).

Lecture Notes

1Jan. 14Course Intro
Different interpretations of probability. Sum rule, product rule, Bayes' Theorem, Law of Total Probability
Reading: Sivia Ch. 1; Cowan 1.1, 1.2
2Jan. 21Programming Primer
Basics of programming in Python. NumPy and Matplotlib extensions.
3Jan. 26Basic Statistics
PDFs and summary statistics: mean, mode median; variance, covariance, correlation; histograms.
Reading: Cowan 1.3-1.5
4Jan. 28Common Probability Distributions
Binomial, negative binomial, multinomial, Gaussian, Poisson, Gamma, Exponential, Chi-square, Cauchy, Landau.
Reading: Cowan Ch. 2
5Feb. 2Monte Carlo Methods
Pseudo-random number generators. Simulating data from arbitrary PDFs.
Reading: Cowan Ch. 3
6Feb. 4Model Selection, Parameter Estimation
Odds ratio. Statistical trials. Marginalization and systematic uncertainties.
Reading: Sivia Ch. 2, 3
7Feb. 9Choosing Priors and Maximum Entropy
Principle of Indifference, uniform and Jeffreys priors, Principle of Maximum Entropy
Reading: Sivia Ch. 5
8Feb. 11PDF Estimators
Best estimators of a PDF: Bayesian and frequentist approaches. Quadratic approximations. Efficiency, consistency, and bias.
Reading: Sivia Ch. 2; Cowan Ch. 5
9Feb. 16Estimators with Correlations
Correlations between parameters: quadratic approximation in 2D, the Hessian matrix, the covariance matrix.
Reading: Sivia Ch. 3
10Feb. 18Minimization Techniques: Maximum Likelihood, Least Squares
Numerical methods and intro to maximum likelihood.
Reading: Sivia Ch. 3; Cowan Ch. 6
11Feb. 23Maximum Likelihood and Least Squares II
Properties of ML estimators. Variances and the Minimum Variance Bound. Goodness of fit.
Reading: Sivia Ch. 3; Cowan Ch. 6; NR in C Ch. 15
12Feb. 25Propagation of Uncertainties
Error propagation formula. Covariance matrix and correlations. Asymmetric error bars. Bayesian approach with the complete PDF.
Reading: Sivia Ch. 3; Cowan Ch. 1, 7
13Mar. 2Systematic Uncertainties
Systematic uncertainties and experimental design. How and when to assign systematic uncertainties.
14Mar. 4Methods for Propagating Systematics
Producing an error budget. The shift method. The covariance method. The pull method. Using Monte Carlo.
Reading: Barlow Ch. 4.4
15Mar. 16Model Selection and Hypothesis Testing
Posterior odds. Classical hypothesis testing: Type I and Type II errors. Using p-values.
Reading: Sivia 4.1-4.2; Cowan 4.1-4.4
16Mar. 18Likelihood Ratio Testing
Statistical significance and power. The Neyman-Pearson Lemma. Wilks' Theorem.
Reading: Cowan Ch. 4
17Mar. 23Sampling from PDFs: Markov Chain Monte Carlo
Sampling from high-dimensional PDFs with MCMC. The Metropolis-Hastings algorithm. The Principle of Detailed Balance. Practical details (burn-in, efficiency). Parallel tempering.
18Mar. 25Sampling from PDFs: Nested Sampling
Evaluating full posterior distributions. Likelihood ordering and Lebesgue integration. Multimodal PDFs.
Reading: Sivia Ch. 9
19Mar. 30Confidence Intervals
Credible intervals and confidence intervals. Upper and lower limits. Confidence belts, coverage, and "flip-flopping" between central intervals and limits.
Reading: Cowan Ch. 9
20Apr. 6Unfolding
Removing an instrumental response from data. Forward folding vs. unfolding. The variance problem. Regularization.
Reading: Cowan Ch. 11
21Apr. 8Spectral Analysis
Nyquist-Shannon Sampling Theorem. Fourier analysis and power spectral density. Schuster and Lomb-Scargle periodograms (with Bayesian derivations).
22Apr. 13Measurement and Bias
Bandwagon effects in experimental results. Confirmation bias: data selection and stopping criteria. Blind analyses.

The homework assignments are available at my.rochester.edu.

Additional Bibliography

In addition to the course texts and books on reserve I also used online materials as resources for these lectures, including lecture notes from similar courses. In the interest of giving credit where it's due, here are some of the best resources out there:

Usage

Anyone who comes across this material and wishes to use it for their own courses is free to do so without requesting my permission. However, please cite S. BenZvi, Dept. of Physics and Astronomy, University of Rochester, 2015.