Chapter 11 Outliers
The empirical mean is sensitive to outliers.
11.1 Trimmed mean estimator
One way of dealing with outliers is to simply remove them. With respect to empirical mean estimation this corresponding estimator is referred to as trimmed mean estimator, \(m_n^{(k)}\). It simply ignores the top and bottom \(k\) values. We have
\[ \begin{equation} \begin{aligned} && \mathbb{E}m_n^{(k)}&= \mathbb{E} \frac{1}{n-2k} \sum_{i=1}^{n} \mathbb{1}_{x_i \notin \text{top/bottom}} x_i\\ \end{aligned} \tag{11.1} \end{equation} \] One can show that if \(k \approx \log( \frac{1}{\delta})\), then with probability
\[ \begin{aligned} && |m_n-m|&=c \sqrt{ \frac{\delta^2 \log( \frac{1}{\delta})}{n}} \\ \end{aligned} \]
11.2 Median-of-means estimator
Another idea involves repeatedly estimating the empirical means of subsets of the data and taking the median of those. In particular, divide the data into \(k\) blocks of \(l\) points each. For each block compute \(m_n^{(j)}= \frac{1}{l}\sum_{j=1}^{l}x_i\). Then the median-of-means estimator is simply:
\[ \begin{equation} \begin{aligned} && m_n&=\text{median}(m_n^{(1)},...,m_n^{(k)}) \\ \end{aligned} \tag{11.2} \end{equation} \]