# Segev BenZvi

## Courses

### Spring 2016

#### PHY 403: Modern Statistics and the Exploration of Large Datasets

This is my second time teaching this course on probability and statistics for graduate students and advanced undergraduates. The class includes a significant data analysis and numerical methods component centered around the Python programming language. (Mathematica, R, and ROOT are also allowed.)

##### Time and Location

MW 10:25 - 11:40 am
Bausch and Lomb 208

##### Textbooks

Data Analysis: A Bayesian Tutorial
D.S. Sivia and John Skilling
ISBN-10: 0198568320

Statistical Data Analysis
Glen Cowan
ISBN-10: 0198501552

The following books are not required but are great references. You may find them on reserve at POA or available as an online electronic reference on the River Campus Library website:

### Syllabus

• Probablity
Interpretations of Probability. Frequentist statistics, Bayes' Theorem, Principle of Maximum Entropy.
• Basic Statistics
Random variables, discrete and continuous probability distributions, cumulative distributions. Mean, variance, and covariance. Central Limit Theorem. Method of moments.
• Common Probablity Distributions
Gaussian, Binomial, Poisson, Exponential, Lognormal, Chi-square, Power Law, Cauchy (Breit-Wigner)
• Monte Carlo Methods
Random number generation, transformation of PDFs, acceptance-rejection technique.
• Bayesian Statistics
Likelihoods, priors, and posteriors. Nuisance parameters, systematic uncertainties, and marginalization. Numerical methods: Markov Chain Monte Carlo.
• Random and Systematic Uncertainties
Error bars and error propagation. Correlations and the "error matrix." Non-Gaussian uncertainties. Techniques for managing systematic uncertainties.
• Parameter Estimation
Maximum likelihood technique. Least squares regression. Minimization techniques. "Robust" alternatives to the least squares method.
• Hypothesis Testing
• Frequentist approach: significance and power, Neyman-Pearson tests, statistical trials, likelihood ratio tests.
• Bayesian approach: posterior odds, the Bayes Factor, the Ockham Factor.
• Interval Estimation
Confidence intervals (frequentist) and credible intervals (Bayesian). Lower and upper limits. The Feldman-Cousins ranking method.
• Classification
Multivariate techniques. Data classifiers and machine learning. Decision trees and boosting.
• Nonparametric Methods
Rank-order statistics. KS tests. Sign test and k-sample test. Contingency tables. Gaussian processes.
• Time Series Analysis and Correlations
Power spectra, periodograms, autocorrelation, cross-correlation. Detection of clusters in data.

 Homework 45% Class Participation 10% Midterm 15% Final Project 30%

Homeworks are assigned bi-weekly and will have a significant programming component. Assignments are due Friday at 5 pm two weeks after it is assigned. You can use any programming language you like (including Mathematica and R), but support from the TA and instructor is limited to Python and ROOT.

You may discuss the problems informally with your classmates but you must complete the homework on your own. Printouts of source code and plots are required to receive full credit.

The final project can be on a data analysis project of your choice, either reflecting your current work or your analysis of a previous result. You will present your results during a 20 minute presentation at the end of the semester (April 20-29).

### Lecture Notes

The homework assignments are available at my.rochester.edu.