# Segev BenZvi

## PHY 403: Modern Statistics and Exploration of Large Datasets

Spring 2015
Bausch and Lomb 315: MW 9:30 - 10:50

PHY 403 is a graduate-level lecture course on probability and statistics. The class also includes a strong data analysis and numerical methods component centered around bi-weekly homework sets.

### Location and Office Hours

 Instructor TA Segev BenZvi Brian Coopersmith B&L 405 B&L 373 Tu 11-12, Th 2-3 W 12:45-1:45

### Textbooks

There are two concise (and inexpensive) textbooks used in this course:

 Data Analysis: A Bayesian Tutorial Statistial Data Analysis D.S. Sivia and John Skilling Glen Cowan ISBN-10: 0198568320 ISBN-10: 0198501552

The following books are not required but are great references. You may find them on reserve at POA or available as an online electronic reference on the River Campus Library website:

### Syllabus

• Probablity
Interpretations of Probability. Frequentist statistics, Bayes' Theorem, Principle of Maximum Entropy.
• Basic Statistics
Random variables, discrete and continuous probability distributions, cumulative distributions. Mean, variance, and covariance. Central Limit Theorem. Method of moments.
• Common Probablity Distributions
Gaussian, Binomial, Poisson, Exponential, Lognormal, Chi-square, Power Law, Cauchy (Breit-Wigner)
• Monte Carlo Methods
Random number generation, transformation of PDFs, acceptance-rejection technique.
• Bayesian Statistics
Likelihoods, priors, and posteriors. Nuisance parameters, systematic uncertainties, and marginalization. Numerical methods: Markov Chain Monte Carlo.
• Random and Systematic Uncertainties
Error bars and error propagation. Correlations and the "error matrix." Non-Gaussian uncertainties. Techniques for managing systematic uncertainties.
• Parameter Estimation
Maximum likelihood technique. Least squares regression. Minimization techniques. "Robust" alternatives to the least squares method.
• Hypothesis Testing
• Frequentist approach: significance and power, Neyman-Pearson tests, statistical trials, likelihood ratio tests.
• Bayesian approach: posterior odds, the Bayes Factor, the Ockham Factor.
• Interval Estimation
Confidence intervals (frequentist) and credible intervals (Bayesian). Lower and upper limits. The Feldman-Cousins ranking method.
• Classification
Multivariate techniques. Data classifiers and machine learning. Decision trees and boosting.
• Nonparametric Methods
Rank-order statistics. KS tests. Sign test and k-sample test. Contingency tables. Gaussian processes.
• Time Series Analysis and Correlations
Power spectra, periodograms, autocorrelation, cross-correlation. Detection of clusters in data.

 Homework 45% Class Participation 10% Midterm 15% Final Project 30%

Homeworks are assigned bi-weekly and will have a significant programming component. Assignments are due Friday at 5 pm two weeks after it is assigned. You can use any programming language you like (including Mathematica and R), but support from the TA and instructor is limited to Python and ROOT.

You may discuss the problems informally with your classmates but you must complete the homework on your own. Printouts of source code and plots are required to receive full credit.

The final project can be on a data analysis project of your choice, either reflecting your current work or your analysis of a previous result. You will present your results during a 20 minute presentation at the end of the semester (April 20-29).

### Lecture Notes

The homework assignments are available at my.rochester.edu.