Preface
Resources
R package
Python code
Session info
About the Author
I Part I
1
Introductory topics
2
Concentration inequalities
2.1
Empirical mean
2.2
Simple non-asymptotic concentration inequalities
2.2.1
Markov’s inequality
2.2.2
Chebychev’s inequality
2.3
Asympotic concentration inequalities
2.3.1
Central Limit Theorem
2.4
Exponential non-asymptotic concentration inequalities
2.4.1
Chernoff bounds
2.4.2
Hoeffding’s Inequality
2.4.3
Bernstein’s inequality
2.5
Examples
2.6
Appendix
2.6.1
Example of a moment generating function
3
Optimization
3.1
Line search
3.1.1
Methodology
3.1.2
Results
4
Empirical risk minimization
4.1
Excess risk and overfitting error
4.1.1
Data splitting
4.1.2
Leave-one-out cross-validation
4.1.3
Realizable case
4.2
Rademacher averages
4.2.1
Finite class of classifiers
4.2.2
Infinitely many classifiers
4.3
Towards VC theory
5
VC theory
5.1
VC dimension
5.1.1
Feature maps
5.1.2
Examples
5.2
Stuctural risk minimization
6
Classification
6.1
Binary classification
6.2
Nearest Neighbour
6.2.1
1NN
6.2.2
KNN
6.3
Linear Classification
6.3.1
Perceptron
7
Regression
7.1
Ordinary least-squares
7.2
Weighted least-squares
7.3
Logisitic regression
7.4
Appendix
7.4.1
Weighted least-squares
7.4.2
Iterative reweighted least-squares
8
Complexity regularization
8.1
Bias-variance tradeoff
9
Dimensionality reduction
9.1
Random projections
9.2
PCA
9.2.1
The maths behind PCA
9.2.2
An intuitive example
9.3
PCA for feature extraction
9.3.1
Squared elements of eigenvectors
9.3.2
SVD
9.4
High-dimensional data
9.4.1
Regularized SVD
9.4.2
Fast, partial SVD
9.5
Forward search
10
Subsampling
10.1
Motivation
10.2
Subsampling methods
10.2.1
Uniform subsampling (UNIF)
10.2.2
Basic leveraging (BLEV)
10.2.3
Predictor-length sampling (PL)
10.2.4
Comparison of methods
10.3
Linear regression model
10.3.1
A review of
Zhu et al.
(
2015
)
10.3.2
Computational performance
10.4
Classification problems
10.4.1
Optimal subsampling for classification problems
10.4.2
Synthetic data
10.4.3
Real data example
10.5
Conclusion
10.6
Appendix
10.6.1
From SVD to leverage scores
10.6.2
From optimal to prediction-length subsampling
10.6.3
Synthetic data
10.6.4
Subsampling applied to sinusoidal function
11
Outliers
11.1
Trimmed mean estimator
11.2
Median-of-means estimator
References
Appendix
From Scratch
Chapter 1
Introductory topics
This chapter provides a whistle-stop tour of simple, but important concepts…