Abstract
In this paper, we consider how to yield a robust empirical likelihood estimation for regression models. After introducing modal regression, we propose a novel empirical likelihood method based on modal regression estimation equations, which has the merits of both robustness and high inference efficiency compared with the least square based methods. Under some mild conditions, we show that Wilks’ theorem of the proposed empirical likelihood approach continues to hold. Advantages of empirical likelihood modal regression as a nonparametric approach are illustrated by constructing confidence intervals/regions. Two simulation studies and a real data analysis confirm our theoretical findings.
Similar content being viewed by others
References
Chatterjee S, Price B (1977) Regression analysis by example. Wiley, New York
Chen J, Variyath AM, Abraham B (2008) Adjusted empirical likelihood and its properties. J. Comput. Graph. Stat. 17:426–443
Chen S, Keilegom I (2009) A review on empirical likelihood methods for regression (with discussions). Test 18:415–447
Chen X, Wang Z, Martin J (2010) Asymptotic analysis of robust lassos in the presence of noise with large variance. IEEE Trans. Inf. Theory 56:5131–5149
Huber P (1981) Robust Statistic. Wiley, New York
Johnson B, Peng L (2008) Rank-based variable selection. J. Nonparametr. Stat. 20:241–252
Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46:33–50
Lee M (1989) Mode regression. J. Econom. 42:337–349
Liu Y, Chen J (2010) Adjusted empirical likelihood with high-order precision. Ann. Stat. 38:1341–1362
Parzen E (1962) On estimation of a probability density function and mode. Ann. Math. Stat. 33:1065–1076
Owen A (1988) Empirical likelihood ratio confidence intervals for a single function. Biometrika 75:237–249
Owen A (1990) Empirical likelihood ratio confidence regions. Ann. Stat. 18:90–120
Owen A (1991) Empirical likelihood for linear models. Ann. Stat. 19:1725–1747
Owen A (2001) Empirical Likelihood. Chapman and Hall, New York
Qin J, Lawless J (1994) Empirical likelihood and general estimating equations. Ann. Stat. 22:300–325
Rousseeuw P, Leroy A (1987) Robust Regression and Outlier Detection. Wiley, New York
Scott D (1992) Multivariate Density Estimation: Theory, Practice and Visualization. Wiley, New York
Wei C, Luo Y, Wu X (2012) Empirical likelihood for partially linear additive errors-in-variables models. Stat. Pap. 53:48–496
Yao, W., Li, L.: A new regression model: modal linear regression. Scand. J. Stat. (2013). doi:10.1111/sjos.12054
Yao W, Lindsay B, Li R (2012) Local modal regression. J. Nonparametr. Stat. 24:647–663
Yu, K., Aristodemou, K.: Bayesian mode regression. Technical report (2012). arXiv:1208.0579v1
Zi X, Zou C, Liu Y (2012) Two-sample empirical likelihood method for difference between coefficients in linear regression model. Stat. Pap. 53:83–93
Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann. Stat. 36:1108–1126
Acknowledgments
The research was supported in part by National Natural Science Foundation of China (11171112, 11001083, 11371142), Chinese Ministry of Education the 111 Project (B14019), Doctoral Fund of Ministry of Education of China (20130076110004), The Natural Science Project of Jiangsu Province Education Department (13KJB110024) and Natural Science Fund of Nantong University (13ZY001).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Proof of Theorem 1
Proof
We first prove the root-\(n\) consistency of \({\hat{\varvec{\beta }}}\), i.e., \(\Vert \hat{\varvec{\beta }}-{\varvec{\beta }}_0\Vert =O_p(n^{-1/2})\). It is sufficient to show that for any given \(\varrho > 0\), there exists a large constant \(C\) such that
where the function \(Q_h(\cdot )\) is defined in (2).
For any vector \({\varvec{v}}\) with length \(C\), by the second-order Taylor expansion, we have
where \(\xi _i\) lies between \(\epsilon _i\) and \(\epsilon _i-n^{-1/2} {\varvec{x}}_i^T{\varvec{v}}\).
We study respectively the magnitudes of \(I_1, I_2\) and \(I_3\). Let \(A_n {=}\! \sum _{i=1}^n \phi _h^{\prime }(\epsilon _i)n^{-1/2} {\varvec{x}}_i\). It follows from condition (C1) and \(\hbox { E}(\phi '_h(\epsilon )) = 0\) that,
The finiteness of \(\hbox {Var}( {\varvec{x}}_i )\) and \(G(h) = {\varvec{E}}(\phi '(\epsilon )^2)\) implies that
Then by central limit theorem, we have for fixed \(C\) that \( A_n \overset{d}{\longrightarrow }N( 0, G(h) \Sigma ) \), and therefore \(I_1 \overset{d}{\longrightarrow }N( 0, G(h) {\varvec{v}}^T\Sigma {\varvec{v}}) \).
For \(I_2\), with the strong law of large numbers, we have \( I_2 = \frac{1}{2}F(h){\varvec{v}}^T\Sigma {\varvec{v}} + o(1) \), where \(F(h)\) is defined in condition (C1).
About \(I_3\), we find that
Condition (C2) implies that \( \frac{1}{6n} \sum _{i=1}^n \rho _{h,c}(\epsilon _i)( {\varvec{x}}_i^T{\varvec{v}})^2 = O_p(1). \) It then follows from the fact that \(\max _{1\le i\le n} (\Vert {\varvec{x}}_i\Vert /\sqrt{n}) = o_p(1)\) that
Overall, we obtain that for any \({\varvec{v}}\) with \(\Vert {\varvec{v}}\Vert = C\),
with \(\delta _n = o_p(1)\). The fact \(- A_n^T{\varvec{v}} \overset{d}{\longrightarrow }N( 0, G(h) {\varvec{v}}^T\Sigma {\varvec{v}}) \) implies that for any \(\varrho >0\) and any nonzero \({\varvec{v}}\), there exists \(K>0\) such that
Thus with probability \(1-\varrho \), it holds that
Note that \(F(h)<0\). Clearly, when \(n\) and \(C\) are both large enough,
In summary, for any \(\varrho >0\), there exists \(C>0\) such that for \({\varvec{v}}=C\), \(nQ_h({\varvec{\beta }}_{0}+ n^{-1/2} {\varvec{v}})-n Q_h({\varvec{\beta }}_{0})\) is negative with probability at least \(1-\varrho \). Thus, (14) holds. That is, with the probability approaching 1, there exists a local maximizer \({hat{\varvec{\beta }}}\) such that \(\Vert \hat{\varvec{\beta }}-{\varvec{\beta }}_0\Vert =O_p(1/\sqrt{n})\).
We turn to proving the asymptotical normality of \( \hat{\varvec{\beta }}\). Denote \(\hat{\varvec{\gamma }}=\hat{\varvec{\beta }}-{\varvec{\beta }}_0\), then \(\hat{\varvec{\gamma }}\) satisfies the following equation
where \(\epsilon _i^*\) lies between \(\epsilon _i\) and \(\epsilon _i-{\varvec{x}}_i^T\hat{\varvec{\gamma }}\). We have shown that
Meanwhile the fact \(\hat{\varvec{\gamma }}= O_p(n^{-1/2})\) and condition (C2) implies that \(J_3 = o_p(1)\). Thus Eq. (19) implies \( {\hat{\varvec{\gamma }}} = - J_2^{-1} J_1 + o_p(1). \) Since the bandwidth \(h\) is a constant not depending on \(n\), by Slutsky’s theorem, we have
\(\square \)
The following lemma is needed to prove Theorem 2.
Lemma 1
Under the conditions of Theorem 1, the \({\mathbf \lambda }_{\beta _0}\) in (10) satisfies \(\Vert {\varvec{\lambda }}_{\beta _0}\Vert =O_p(n^{-1/2})\).
Proof
Denote \({\varvec{\lambda }}_{\beta _0}=\zeta {\mathbf u}_0\) with \({\mathbf u}_0\) a unit vector and \(\zeta =\Vert {\varvec{\lambda }}_{\beta _0}\Vert \). Define matrix \({\varvec{\Phi }}_n({\varvec{\beta }})=n^{-1} \sum _{i=1}^n \xi _i({\varvec{\beta }})\xi _i^T({\varvec{\beta }})\) and \(Z=\max _{1\le i\le n}\Vert \xi _i({\varvec{\beta }}_0)\Vert \). It follows from the definition of \({\varvec{\lambda }}_{\beta _0}\) that
which implies
By the Cauchy–Schwarz inequality and law of large numbers, we have
This together with Eq. (17) gives
Condition (C1) and law of large numbers implies \({\varvec{\Phi }}_n \mathop {\longrightarrow }\limits ^{\hbox { p}} G(h){\varvec{\Sigma }}\), which means that there exists \(c>0\) such that \(P({\varvec{u}}_0^T{\varvec{\Phi }}_n{\varvec{u}}_0>c)\rightarrow 1\) as \(n\rightarrow \infty \).
Furthermore, since \( {n}^{-1/2}\sum _{i=1}^n\xi _i({\varvec{\beta }}_0) \mathop {\longrightarrow }\limits ^{\hbox { d}} N(0, {\varvec{\Phi }}) \), we find that \(\Vert {\varvec{\lambda }}_{\beta _0}\Vert =O_p(n^{-1/2})\). \(\square \)
1.2 Proof of Theorem 2
Proof
Let \(y_i = {\varvec{\lambda }}^T_{\beta _0}\xi _i({\varvec{\beta }}_0)\). It follows from Lemma 1 that
which implies that the upcoming Taylor expansion is valid. Applying the second-order Taylor expansion on \((1+y_i)^{-1}\) for \(i\) from 1 to \(n\), we obtain from Eq. (10) that
where \( r_n({\varvec{\beta }}_0)\ =(1/n)\sum _{i=1}^n\xi _i({\varvec{\beta }}_0) (1+\delta ^*_i)^{-1}\{{\varvec{\lambda }}^T_{\beta _0}\xi _i({\varvec{\beta }}_0)\}^2 \) and \(\delta ^*_i\) lies between \(0\) and \(y_i\). Clearly \(\max _{1\le i\le n}|\delta _i^*| = o_p(1)\). Therefore
Thus we have
Similarly, by the third-order Taylor expansion on \(\log (1+y_i)\) for all \(i\), we have
where \(\eta _i\) lies between \(0\) and \(y_i\). It can be verified that
Furthermore, by incorporating Eq. (24), we have
Since \( \xi _i({\varvec{\beta }}_0)={\varvec{x}}_i\phi _h^{\prime } \epsilon _i \), it follows from conclusion of Lemma 1 that as \(n\rightarrow \infty \),
which immediately implies \(-2 l({\varvec{\beta }}_0) \mathop {\longrightarrow }\limits ^{\mathrm{d}} \chi ^2_p\). This completes the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Zhao, W., Zhang, R., Liu, Y. et al. Empirical likelihood based modal regression. Stat Papers 56, 411–430 (2015). https://doi.org/10.1007/s00362-014-0588-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-014-0588-4