A Global Encompassing Criterion for Nonparametric Regression Models

Christophe Bontemps    Jean-Pierre Florens1
      Gremaq            Gremaq and IDEI

University of Social Sciences, Toulouse, France

January 10th, 95


 Home


Pdf Version
Abstract: The concept of encompassing aims at validating or invalidating a tentative econometric model by testing its capacity to account for results obtained under alternative specifications.

1   Introduction




One of the most important scientific activity has been, and still is, the comparison of theories and/or models. The idea that a new model, or a new theory, must induce a progress in the knowledge of a phenomena is often emphasized, nevertheless, it seems to be equally important to check that this new model or theory, is able to explain what older, ``established'' models already explained.

Therefore, encompassing formalizes a research strategy which has been applied in many areas of science. This principle has been introduced in econometrics by Hendry, Mizon and Richard (see Mizon [33], Mizon and Richard [34], and Hendry and Richard [30]) and has been developed recently through the work of Gourieroux, Montfort and Trognon (see Gourieroux and Montfort [17] and [18], or Gourieroux, Montfort and Trognon [20]), Lu and Mizon [31], or Govaerts, Hendry and Richard [21] (see Bontemps [5] for a survey). Whithin a Bayesian context, Florens, Hendry and Richard [15], have recently provided a theoretical framework for interpreting encompassing as a notion of ``sufficiency'' among models. Each of these papers dealing with encompassing do present an application to the choice of regressors, applying parametric encompassing in either a ``static'' linear, a dynamic, or a Bayesian framework.

At a related increasing level, attention has been paid over recent years to the development of robust techniques of inference, with special emphasis on non-parametric methods in order to reduce the impact of errors of specification (see Collomb [9] and [10], Bierens [4], or the numerous publications of Härdle [24], Härdle and Marron, [27] Härdle and Mammen [26], etc...).

The contribution of this paper is that of integrating recent developments in these two areas of research. Our motivation is to describe and analyze the encompassing principle in regression models where we let the models be free of any functional shape.

The concept of encompassing used here is in line with that of parametric encompassing as defined by Mizon and Richard [34]. Formal definitions are offered below, but a brief presentation clarifies the encompassing procedure. Let M1 and M2 denote two regression models of a variable Y with respective regressors X and Z, and let fn and gn be two estimators of the regression functions associated to these models. One then define the pseudo-true value G(f) of gn as the plim under M1 of the latter. The basis of the comparison is the encompassing difference between gn and an estimator of its pseudo-true value G(fn), which is here a function of z. Encompassing of M2 by M1 is realized if that difference (the lack of encompassing) is not significant relative to its asymptotic sampling distribution.

The paper is organized as follows : In the following section we introduce the notations and assumptions, the estimators associated to the models and the pseudo-true value are defined in section 3. The nonparametric encompassing statistics are then defined in section 4 where we derive a basic and a global statistic. In the same section we analyze the asymptotic behavior of these statistics, leading to a Bootstrap approach in section 5. The simulation results of this Bootstrap procedure are given in the final section.

2   Notations and models

Let the full set of variables required by, and available to, an investigator in order to estimate, test and analyze competing regression models be denoted by S=(Y,X,Z). And let ( Si) i=1,... ,n be n observations where YiÎ Â , XiÎ Â p, and ZiÎ Â q. The variables Xi and Zi represent the conditioning variables associated with the models M1 and M2.

Formaly ( Si) i=1,... ,n constitute a centered, square integrable process defined on a probability space ( W ,A,P0) . The probability P0 is unknown and we shall restrict our attention to functions defined from it.

We assume that the process ( Si) i=1,... ,n is i.i.d., hence its distribution is fully characterized by a single observation which is itself described by its density2 j (· ) with respect to the Lebesgue measure in  p+q+1.

The components of ( Xi,Zi) are assumed to be linearly independent from each other. The latter assumption can be relaxed, allowing for common components in Xi and Zi or, more generally, for lack of linear independence between components of Xi and Zi. In such cases, the density j would be taken with respect to the Lebesgue measure restricted to the appropriate subspace of  p+q+1.

We use the following notations to represent the regression functions3 :

f(· ) =
E [ Y| X=· ]
g(· ) =
E [ Y| Z=· ]

The process ( Si) i=1,... ,n being square integrable, the functions f and g are themselves square integrable on ( W ,A,P0) .

In the rest of the paper we will assume the following regularity condition :

Hypothesis 2.1   There exist a continuous version of the regression functions f and g as well as continuous versions of the joint, marginal and conditional densities (represented here by the same function j ).

The ``encompassing'' model M1 , to be validated, is based on the exclusion of the variable Z from the regression this exclusion is assumed through the following H1 hypothesis :

The model ``to be encompassed'' M2 is based on the regression with Zi as sole regressors. From the perspective of the ``owner'' of M1, this model is not of special interest, and is purely instrumental in the construction of encompassing tests aimed at validating M1.

In our context, both models will be associated with a nonparametric estimator of the regression function involved in its definition. A variety of nonparametric estimators of regression functions are now available (see Härdle[24]), but we will focus our attention on kernel estimators.

We use the Nadaraya [35] and Watson [41] estimator for f and g given by :
fn(x)=
1
n.hnp
n
å
i=1
YiK æ
ç
ç
è
Xi-x
hn
ö
÷
÷
ø
1
n.hnp
n
å
i=1
K æ
ç
ç
è
Xi-x
hn
ö
÷
÷
ø

with the condition fn(x)=0 if the denominator 1/n.hnpåi=1nK( Xi-x/hn) (an estimator of the marginal density j (x)) is zero.

In this expressions K denote a Parzen-Rosenblatt kernel i.e. an application from  p to  which is integrable with respect to the Lebesgue measure, of integral one and which satisfies the limit condition :
 
lim
\| x \| ® ¥
\| x \|
p
 
 
· K(x)=0

where || · || denotes the Euclidian norm.

The estimator of g is defined accordingly :
gn(z)=
1
n.knq
n
å
i=1
YiK æ
ç
ç
è
Zi-z
kn
ö
÷
÷
ø
1
n.knq
n
å
i=1
K æ
ç
ç
è
Zi-z
kn
ö
÷
÷
ø

In order to alleviate notation, we shall use a common notation K for the kernels involved in these estimators, tough they may obviously differ from each other, in particular for considerations of dimension.

The window-width hn and kn are also different and their convergence rates must be adjusted to the regressors dimensions. We shall assume the traditional convergence conditions for these sequences :

Hypothesis 2.2   (Window-width minimal conditions)

The window-width hn and kn satisfies :
 
lim
n® ¥
hn=0     and    
 
lim
n® ¥
n· hnp=¥
 
lim
n® ¥
kn=0     and    
 
lim
n® ¥
n· knq=¥

These conditions insure us of the consistency of fn(x) and gn(z) in their respective models.

In order to simplify this study, we shall assume an homoscedasticity condition.

Hypothesis 2.3   (Homoscedasticity)

Under
M1, Var[ Y| X,Z] =s 2

The previous assumptions, i.i.d., square integrability, continuous version (Hyp.2.1), window-width conditions (Hyp.2.2), and homoscedasticity (Hyp. 2.3) are maintained for the rest of the paper, we only list additional conditions.

3   Pseudo-true value




Encompassing being a notion linking ``estimated models'', that is models associated to estimators (see Florens, Hendry and Richard [15], or Bontemps [5], for a general discussion of encompassing), we have to derive the convergence properties of these estimators. In line with the objectives of our encompassing analysis we shall derive their limits under the mean-conditional independence assumption associated with M1.

Theorem 3.1   Under H1, we have :
i)     fn(x)
n® ¥
¾®
 
f(x)          " x
ii)     gn(z)
n® ¥
¾®
 
E [ f(x)| Z=z ]      " z

Proof :

The proof follows from the properties of kernel estimators (see Bosq and Lecoutre[7]), under the minimal conditions assumed, a kernel estimator of a conditional expectation tends in probability toward the latter in every point, so :

i)     fn(x)  
n® ¥
¾®
 
E [ Y| X=x ]    " x
ii)     gn(z)  
n® ¥
¾®
 
E [ Y| Z=z ]    " z

Under H1, we have :
E [ Y| Z=z ]
=
E [ E [ Y| X=x,Z=z ] | Z=z ]
  =
E [ E [ Y| X=x, ] | Z=z ]
  =
E [ f(x)| Z=z ]



This property lead us to the definition of the pseudo-true value associated to gn(z) on H1. It is defined in the same spirit than the classical ``parametric'' pseudo-true value associated to an estimator of a parameter of interest in M2. According to Hendry and Richard[30], the pseudo-true value of gn is given by the plim under M1 of the latter.

Definition 3.1   The pseudo-true value G associated to gn(z) on H1 is
G(f)(z)=E [ f(x)| Z=z ]

This pseudo-true value is a reinterpretation of gn(z) under the belief that M1 is the ``true'' model. It is estimated in the same way than gn(z) by G(f)(z) :
G(f)(z)=
1
n.knq
n
å
i=1
f(Xi)· K æ
ç
ç
è
Zi-z
kn
ö
÷
÷
ø
1
n.knq
n
å
i=1
K æ
ç
ç
è
Zi-z
kn
ö
÷
÷
ø

4   Nonparametric encompassing

4.1   Basic statistic

As usual in a nonparametric framework, we have to assume some regularity assumption and to impose some arbitrary rates of smoothness for the functions involved in our statistics.

Hypothesis 4.1   (Regularity)


Hypothesis 4.2   (Kernels orders)

The Parzen-Rosenblatt kernels involved in
fn and gn must satisfy the following conditions  :
ó
õ
 


 p
p
Õ
i=1
x
ai
 
i
K(x1,x2,... ,xp)dx1··· dxp= ì
ï
ï
í
ï
ï
î
1   if   ai=0 , "  i =1,... ,p
 
0   if   0<
p
å
i=1
ai<d
and     ó
õ
 


 p
| xi |
m
 
 
| K(x1,x2,... ,xp) | dx1··· dxp<¥    , " xÎ Â p

In order to have positive kernels, we impose d=2.



Let hn and kn be the window-width associated to the estimators fn(x) and gn(z) respectively, these window-widths must verify :

Hypothesis 4.3  
 
lim
n® ¥
n· hnp+2d=0     and    
 
lim
n® ¥
n· knq+2d=0

These latter conditions are instruments to ``kill the bias'' remaining asymptotically from the estimation of the regression functions and are standard in nonparametric estimation(see Bierens [BIE]).

To these three hypothesis which insure us of the asymptotic normality of the estimators we must add a condition on the relation between the rates of convergence of the two smoothing parameters used :

Hypothesis 4.4   The window-widths hn and kn must moreover satisfy :
log (n)·
knq
hnp
n® ¥
¾®
 
0

In the univariate case (p=q=1), the latter conditions means heuristically that the window-width kn must converge to zero ``faster'' than hn.



The encompassing statistic in this nonparametric context, is based on the encompassing difference, which is here the function d f,g(z) built on the difference between the estimator associated to M2, gn and an estimator of its pseudo-true value, G(fn), i.e. :
d f,g(z)=gn(z)-G(fn)(z)

Under the previous hypothesis, Bontemps, Florens and Richard [6], have derived the asymptotic behavior of this statistic, once normalized.

Theorem 4.1   Under H1, and under the hypothesis 4.1 and 4.2, and if the window-with satisfy the hypothesis 4.3 and 4.4, we get :
n· knq· d f,g(z)
D
¾®
 
N æ
ç
ç
è
0,
s 2 ó
õ
K2
j (z)
ö
÷
÷
ø

A proof of this result is given in the mathematical appendix.


The asymptotic behavior of this statistic, is in line with the (parametric) normality obtained in the classic asymptotic study of the encompassing statistic in parametric cases (see Hendry and Richard [30]). Nevertheless, the local (because functional) characteristic of this statistic may be disappointing, or ambiguous to use in practice. These considerations motivate the construction of a global encompassing criterion, similar the quadratic parametric criterion, provided by Mizon [33], or Mizon and Richard [34], and which may be more useful.

This statistic may rest on different types of criteria :

Or any criterion giving a global vision of the encompassing difference, using any type of distance or norm.

The convergence rates a 1(n) and a 2(n) are then chosen in order to derive an asymptotic convergence to a distribution upon which the test is based. In a parametric context, the parametric ``n'' rate of convergence is used to derive a c 2 distribution from which are derived Wald Encompassing Tests (see Hendry and Richard [30], or Lu and Mizon [31]. Obviously in this nonparametric framework, these rates may involve the window-width in their definition.

4.2   A global Criterion

After some work, the simulations on the first criterion gave very disappointing results, so we present here a more interesting result concerning the ``integral type'' criterion which is  :
Y =n· kn· ó
õ
( d f,g(z) )
2
 
 
v ( z ) dz

The asymptotic behavior of this criterion has been established and is presented in this section. We need to introduce some additional hypothesis to derive the next result. For simplicity we place this work in the univariate case (p=q=1)

The existence and regularity of the following conditional moment are assumed

Hypothesis 4.5   Let f4 denote the following function
f4(x)=E é
ê
ê
ë
( Y-f(x) )
4
 
 
| X=x ù
ú
ú
û

we assume that
ó
õ
f4(x)j (x)dx<¥

As in the previous section we need to impose a condition on the relation between the rates of convergence of the two smoothing parameters hn and kn.

Hypothesis 4.6   The window-width hn and kn must verify :
log (n)·
kn
hn
n® ¥
¾®
 
0

Theorem 4.2   Under ( H1, H4) , and under the hypothesis 4.1, 4.2 and 4.5, if the window-widths hn and kn, satisfy the hypothesis 4.3 and 4.6, we have :
n· kó
õ
( d f,g(z) )
2
 
 
v (z)dz=
1
k
  s 2· ó
õ
K2(u)n n(u)du
+nk· In,2
+Op ( k ) +Op æ
ç
ç
è
1
n· k
ö
÷
÷
ø

In this expression, we have :

as n® ¥


Remark :

This result is similar to a result given in Härdle and Mammen [MAMAM], in the context of the comparison of a parametric and a nonparametric curve estimate, based on the following criterion :

Tn=nhp   ó
õ
æ
è
f (x)-F(f
 
q
)(x) ö
ø
2

 
v (x)dx

where F(fq )(x) is a kernel estimator of the regression of fq(x) (the parametric function) on Zi's for q =q  :
F(f
 
q
)(x)=
 
å
i
 f
 
q
(Xi)· K æ
ç
ç
è
Xi  - x
h
ö
÷
÷
ø
 
å
i
 K æ
ç
ç
è
Xi  -  x
h
ö
÷
÷
ø



The remaining bias 1/k  s 2· ò K2(u)n n(u)du in the asymptotic decomposition of our global statistic may be estimated easily in order to obtain an unbiased statistic, but it seems to be more useful to consider alternative methods. We will study in the next section Bootstrap methods as an alternative to asymptotic.

5   Bootstrap


.



0   Mathematical Appendix

@.1   Proof of theorem 4.2

Let y =ò ( d f,g(z)) 2v (z)dz be the object of our asymptotic study.


Using our definition for the weight function v (z)=j 2(z)· n n(z) we get :


y =
1
n2k2
ó
õ
æ
ç
ç
è
 
å
i
( Yi-f(Xi) ) K æ
ç
ç
è
Zi-z
k
ö
÷
÷
ø
ö
÷
÷
ø
2



 
n n(z)dz

which may be decomposed as :
y
=
1
n2k2
ó
õ
æ
ç
ç
è
 
å
i
( Yi-f(Xi) ) K æ
ç
ç
è
Zi-z
k
ö
÷
÷
ø
+
 
å
j
( f(Xj)-f(Xj) ) K æ
ç
ç
è
Zj-z
k
ö
÷
÷
ø
ö
÷
÷
ø
2



 
n n(z)dz
   
 
=
1
n2k2
ó
õ
( Q1(z)+Q2(z) )
2
 
 
n n(z)dz

with :
  Q1(z)
=
 
å
i
( Yi-f(Xi) ) K æ
ç
ç
è
Zi-z
k
ö
÷
÷
ø
and    
  Q2(z)
=
 
å
j
( f(Xj)-f(Xj) ) K æ
ç
ç
è
Zj-z
k
ö
÷
÷
ø

From this expression we get three terms corresponding to the integral of Q1 and Q2 squared and to the cross product respectively :
F1
=
1
n2k2
ó
õ
é
ê
ê
ë
 
å
i
( Yi-f(Xi) ) K æ
ç
ç
è
Zi-z
k
ö
÷
÷
ø
ù
ú
ú
û
2



 
n n(z)dz
   
F2
=
1
n2k2
ó
õ
é
ê
ê
ë
 
å
j
( f(Xj)-f(Xj) ) K æ
ç
ç
è
Zj-z
k
ö
÷
÷
ø
ù
ú
ú
û
2



 
n n(z)dz
and  
   
F3
=
1
n2k2
ó
õ
é
ê
ê
ë
 
å
i
( Yi-f(Xi) ) K æ
ç
ç
è
Zi-z
k
ö
÷
÷
ø
ù
ú
ú
û
· é
ê
ê
ë
 
å
j
( f(Xj)-f(Xj) ) K æ
ç
ç
è
Zj-z
k
ö
÷
÷
ø
ù
ú
ú
û
n n(z)dz

From a global point of view, we have :

y =F1+F2+2· F3

and we'll prove that

@.1.1   Study of F1




The expression of F1 is :

F1=
1
n2k2
ó
õ
é
ê
ê
ë
 
å
i
( Yi-f(Xi) ) K æ
ç
ç
è
Zi-z
k
ö
÷
÷
ø
ù
ú
ú
û
2



 
n n(z)dz

We can separate this study in two parts, using the following decomposition :
F1 =
1
n2k2
ó
õ
 
å
i
( Yi-f(Xi) ) 2K2 æ
ç
ç
è
Zi-z
k
ö
÷
÷
ø
n n(z)dz
   
+2·
1
n2k2
ó
õ
 
å
i,j
 
å
i<j
( Yj-f(Xj) ) K æ
ç
ç
è
Zj-z
k
ö
÷
÷
ø
( Yi-f(Xi) ) K æ
ç
ç
è
Zi-z
k
ö
÷
÷
ø
n n(z)dz
  = In,1+2· In,2

These terms In,1 and In,2 have a very different asymptotic behavior, we show that :

while

We shall note E the expectation conditional to Xi and Zi :

First we show that :

( nk )
4
 
 
· E
 
 
é
ê
ê
ê
ë
æ
è
In,1-E
 
 
[ In,1 ] ö
ø
2

 
ù
ú
ú
ú
û
=
 
å
i
E
 
 
é
ê
ê
ë
( Yi-f(Xi) )
2
 
 
-s 2 ù
ú
ú
û
2



 
· é
ê
ê
ë
ó
õ
K2
æ
ç
ç
è
Zi-z
k
ö
÷
÷
ø
n n(z)dz ù
ú
ú
û
2



 

we note that :
ó
õ
K2
æ
ç
ç
è
Zi-z
k
ö
÷
÷
ø
n n(z)dz£ k ó
õ
K2
( u ) n (u)du

Under 4.5 and the weak low of large numbers :
( nk )
4
 
 
· E
 
 
é
ê
ê
ê
ë
æ
è
In,1-E
 
 
[ In,1 ] ö
ø
2

 
ù
ú
ú
ú
û
=
Op æ
ç
ç
è
( nk )
2
 
 
ö
÷
÷
ø
 
å
i
f4 ( Xi )
  =
Op ( nk2 )

It now follows via the Chebyshev's inequality that if ( l n) nÎ N is any sequence of constants diverging to ¥ ,

Pr ì
ï
ï
í
ï
ï
î
½
½
In,1-E
 
 
[ In,1 ] ½
½
>l n·
nk2
( nk )
4
 
 
| Xi,Zi ü
ï
ï
ý
ï
ï
þ
n® ¥
¾®
 
0

so
In,1=E
 
 
[ In,1 ] +Op æ
ç
ç
ç
ç
ç
è
nk2
( nk )
4
 
 
ö
÷
÷
÷
÷
÷
ø

which proves i)


This term gives the asymptotic normality in (4.2)
( nk )
2
 
 
In,2=
1
n2k2
 
å
i,j
 
å
i<j
( Yj-f(Xj) ) ( Yi-f(Xi) ) ó
õ
K
æ
ç
ç
è
Zj-z
k
ö
÷
÷
ø
K æ
ç
ç
è
Zi-z
k
ö
÷
÷
ø
n n(z)dz

We may rewrite this term as :
( nk )
2
 
 
In,2=
 
å
i,j
 
å
i<j
( Yj-f(Xj) ) ( Yi-f(Xi) ) Wn,i,j

or
( nk )
2
 
 
In,2=
n
å
i=2
Yn,i     (3)

where
Yn,i= ( Yi-f(Xi) )
i-1
å
j=1
( Yj-f(Xj) ) Wn,i,j   2£ i£ n

Let Fn,i denotes the s -field generated by S1,··· ,Sn (where Si=(Xi,Yi,Zi)) then
E [ Yn,i| Fn,i ] =0   a.s.   " i

therefore the sequence
ì
í
î
   æ
ç
ç
è
Mn,i=
i
å
j=2
Yn,j  , Fn,i ö
÷
÷
ø
    ,2£ i£ n ü
ý
þ

is a martingale triangular array (see Hall[23]).The conditional variance of Mn,n is given by :
Vn=
n
å
i=2
E [ Yn,i2| Fn,i-1 ] =
n
å
i=2
s 2 é
ê
ê
ë
i-1
å
j=1
( Yj-f(Xj) ) Wn,i,j ù
ú
ú
û
2



 

Which may be cut into the sum of the squared terms and the double crossed-product :
Vn
=n.s 2
i-1
å
j=1
( Yj-f(Xj) ) 2Wn,i,j2
   
 
+2ns 2
 
å
1£ j£ l£ i-1
( Yj-f(Xj) ) ( Yl-f(Xl) ) Wn,i,jWn,i,l
   
  =Vn,1+Vn,2

Hall ([23] lemma 1 and 2), on the basis of a central limit theorem due to Brown [8], give us the latter result :
1
n· k3/2
· Mn,n
D
¾®
 
N æ
ç
ç
è
0,
1
4
a 1 ö
÷
÷
ø

where a 1 is defined by :
1
n2· k3
· Vn,1¾®
1
4
a 1

and
a 1=2· s 4 é
ë
ó
õ
j 2(x)n (x)dx
ù
û
· é
ê
ê
ê
ë
ó
õ
é
ë
ó
õ
K(u)K(u+v)du
ù
û
2

 
dv ù
ú
ú
ú
û

reporting this result in (@.1.1), we get :

1
n· k3/2
· Mn,n= ( n· k ) · In,2
D
¾®
 
N æ
ç
ç
è
0,
1
4
· a 1 ö
÷
÷
ø

which proves (@.1.1) and therefore ii).


so :
F1
=
1
nk
 s 2· ó
õ
K2(u)n n(u)du
   
  +2· In,2
   
 
+Op æ
ç
ç
è
1
n3· k2
ö
÷
÷
ø

1

@.1.2   Study of F2

We have to check that n· k· F2 disappears asymptotically, where F2 is :

F2=
1
n2k2
ó
õ
é
ê
ê
ë
 
å
j
( f(Xj)-f(Xj) ) K æ
ç
ç
è
Zj-z
k
ö
÷
÷
ø
ù
ú
ú
û
2



 
n n(z)dz

The study of this term is simplified by the use of the hypothesis 4.6, because
F2£ æ
ç
ç
è
 
sup
Xj
| f(Xj)-f(Xj) | ö
÷
÷
ø
2 ó
õ
é
ê
ê
ë
1
nk
·
 
å
j
K æ
ç
ç
è
Zj-z
k
ö
÷
÷
ø
ù
ú
ú
û
2n n(z)dz

and

From Györfi, Härdle, Sarda and Vieu [22], (or Bierens [4]), we have, under the hypothesis 4.2 and 4.3 :
æ
ç
ç
è
 
sup
Xj
| f(Xj)-f(Xj) | ö
÷
÷
ø
=Op æ
ç
ç
è
Max æ
ç
ç
è
log (n)
n· h
  ,  hd ö
÷
÷
ø
ö
÷
÷
ø

so that
æ
ç
ç
è
 
sup
Xj
| f(Xj)-f(Xj) | ö
÷
÷
ø
2



 
=Op æ
ç
ç
è
Max æ
ç
ç
è
log (n)
n· h
  ,  h
2· d
 
ö
÷
÷
ø
ö
÷
÷
ø

then, with d=2 :
( n· k ) · F2£ Op æ
ç
ç
è
Max æ
ç
ç
è
n· k· log (n)
n· h
  ,n· k  h4 ö
÷
÷
ø
ö
÷
÷
ø
· ó
õ
j 2(z)n n(z)dz

Under 4.6, we check that :
n· k· log (n)
n· h
n® ¥
¾®
 
0
and
n· k  h4£ n· h5
n® ¥
¾®
 
0

So that the term ( n· k) · F2 tend to 0 in probability. 2

@.1.3   Study of F3

F3 =
1
n2k2
ó
õ
é
ê
ê
ë
 
å
i
( Yi-f(Xi) ) K æ
ç
ç
è
Zi-z
k
ö
÷
÷
ø
ù
ú
ú
û
é
ê
ê
ë
 
å
j
( f(Xj)-f(Xj) ) K æ
ç
ç
è
Zj-z
k
ö
÷
÷
ø
ù
ú
ú
û
n n(z)dz
  =
1
n2k
 
å
i
 
å
j
( Yi-f(Xi) ) ( f(Xj)-f(Xj) ) ó
õ
1
k
K æ
ç
ç
è
Zi-z
k
ö
÷
÷
ø
K æ
ç
ç
è
Zj-z
k
ö
÷
÷
ø
n n(z)dz

The term involving the kernel product may be viewed as
ó
õ
1
k
· ¡ (z)K æ
ç
ç
è
Zj-z
k
ö
÷
÷
ø
dz

with ¡ (z)=K( Zi-z/k) n n(z) and is asymptotically equivalent to :
K æ
ç
ç
è
Zj-Zi
k
ö
÷
÷
ø
n (Zi)

So, we shall concentrate our attention on the study of
F3=
1
n2k
 
å
i
 
å
j
( Yi-f(Xi) ) ( f(Xj)-f(Xj) ) K æ
ç
ç
è
Zj-Zi
k
ö
÷
÷
ø
n (Zi)

For i=1,··· ,n, let i=Yi-f(Xi). In order to prove that n· k F3® 0 in probability, we are going to study the two first moments of F3.

Obviously
E [ F3 ] =0

Let us study F32
F32=
1
n4k2
æ
ç
ç
è
 
å
i
 
å
j
i· ( f(Xj)-f(Xj) ) · K æ
ç
ç
è
Zj-Zi
k
ö
÷
÷
ø
n (Zi) ö
÷
÷
ø
2



 

Before developing this sum, we may introduce the expression f(Xj)-f(Xj) in order to obtain the triple sum (squared) :
F32=
1
n4k2
æ
ç
ç
è
 
å
i
 
å
j
1
nh
 
å
l
i· l· K æ
ç
ç
è
Xl-Xj
h
ö
÷
÷
ø
K æ
ç
ç
è
Zj-Zi
k
ö
÷
÷
ø
n (Zi)
j (Xj)
ö
÷
÷
ø
2



 

If we develop this sum we have :

F32
=
1
n4k2
1
n2h2
  
 
å
i,j,l,i
 
 
,j
 
 
l
 
 
   i
 
i
 
 
l
 
l
 
 
· K æ
ç
ç
è
Xl-Xj
h
ö
÷
÷
ø
K æ
ç
ç
è
X
 
l
 
 
-Xj
 
 
h
ö
÷
÷
ø
   
 
    × K æ
ç
ç
è
Zj-Zi
k
ö
÷
÷
ø
K æ
ç
ç
è
Z
 
j
 
 
-Z
 
i
 
 
k
ö
÷
÷
ø
·
n (Zi)
j (Xj)
·
n (Z
 
i
 
 
)
j (X
 
j
 
 
)

The interest of this decomposition is not to confuse the reader, but to show that the expectation of the terms involved in this sum is zero, except for a few values of (i,j,l,i ,j l ). In other words

   
E é
ê
ê
ê
ê
ë
i
 
i
 
 
l
 
l
 
 
· K æ
ç
ç
è
Xl-Xj
h
ö
÷
÷
ø
K æ
ç
ç
è
X
 
l
 
 
-Xj
 
 
h
ö
÷
÷
ø
K æ
ç
ç
è
Zj-Zi
k
ö
÷
÷
ø
K æ
ç
ç
è
Z
 
j
 
 
-Z
 
i
 
 
k
ö
÷
÷
ø
·
n (Zi)
j (Xj)
n (Z
 
i
 
 
)
j (X
 
j
 
 
)
ù
ú
ú
ú
ú
û
  = 0

Except

Following this decomposition of the indexes, we have :
E [ F32 ]
=
1
n6k2
1
h2
 
 
å
i,j,j
 
 
E [ A ]
   
 
+
1
n6k2
1
h2
  
 
å
j,j
 
 
,i,l
 E [ C1 ]
   
 
+
1
n6k2
1
h2
  
 
å
j,j
 
 
,i,i
 
 
 E [ C2 ]
   
 
1
n6k2
1
h2
  
 
å
j,j
 
 
,i,l
 E [ C3 ]

Let us study first
1
n6k2
1
h2
 
 
å
i,j,j
 
 
E [ A ]
=
1
n4
æ
ç
ç
è
 
å
i
f4(Xi)
   
 
×
 
å
j,j
 
 
1
nh2
K æ
ç
ç
è
Xi-Xj
h
ö
÷
÷
ø
K æ
ç
ç
è
Xi-Xj
 
 
h
ö
÷
÷
ø
   
 
·
1
nk2
K æ
ç
ç
è
Zj-Zi
k
ö
÷
÷
ø
K æ
ç
ç
è
Z
 
j
 
 
-Zi
k
ö
÷
÷
ø
·
n (Zi)
j (Xj)
n (Zi)
j (X
 
j
 
 
)
ö
÷
÷
÷
÷
ø

in reorganizing the terms, we get :
1
n6k2
1
h2
 
 
å
i,j,j
 
 
E [ A ]
=
1
n4
æ
ç
ç
è
 
å
i
f4(Xi)
   
 
×
 
å
j
1
nh2
K æ
ç
ç
è
Xi-Xj
h
ö
÷
÷
ø
K æ
ç
ç
è
Zj-Zi
k
ö
÷
÷
ø
/j (Xj)
   
 
·
 
å
j
 
 
1
nk2
K æ
ç
ç
è
Xi-Xj
 
 
h
ö
÷
÷
ø
K æ
ç
ç
è
Z
 
j
 
 
-Zi
k
ö
÷
÷
ø
/j (X
 
j
 
 
)· n 2(Zi) ö
÷
÷
÷
ø

the term
 
å
j
1
nh2
K æ
ç
ç
è
Xi-Xj
h
ö
÷
÷
ø
K æ
ç
ç
è
Zj-Zi
k
ö
÷
÷
ø
/j (Xj)¾®
j (Xi,Zi)
j (Xi)
=j (Zi| Xi)

is assumed to be bounded sa under 4.5, and the low of the large numbers
1
n6k2
1
h2
 
 
å
i,j,j
 
 
E [ A ] =Op(1)·
1
n4
 
å
i
f4(Xi)n 2(Zi)=Op æ
ç
ç
è
1
n3
ö
÷
÷
ø

then, the term 1/n6k21/h2  åj,j ,i,l E[ C1] ,
1
n6k2
1
h2
 
 
å
i,j,j
 
 
E [ C1 ]
=
1
n4
æ
ç
ç
è
 
å
i,l
s 2s 2
   
 
×
 
å
j,j
 
 
1
nh2
K æ
ç
ç
è
Xi-Xj
h
ö
÷
÷
ø
K æ
ç
ç
è
Xi-Xj
 
 
h
ö
÷
÷
ø
   
 
×
1
nk2
K æ
ç
ç
è
Zj-Zi
k
ö
÷
÷
ø
K æ
ç
ç
è
Z
 
j
 
 
-Zi
k
ö
÷
÷
ø
·
n (Zi)
j (Xj)
n (Zi)
j (X
 
j
 
 
)
ö
÷
÷
÷
÷
ø

as in the previous expression, we may group the terms in order to have a similar expression,
.
1
n6k2
1
h2
 
 
å
i,j,j
 
 
E [ C1 ]
=
1
n4
æ
ç
ç
è
 
å
i,l
s 4
   
 
×
 
å
j
1
nh2
K æ
ç
ç
è
Xi-Xj
h
ö
÷
÷
ø
K æ
ç
ç
è
Zj-Zi
k
ö
÷
÷
ø
/j (Xj)
   
 
× æ
ç
ç
ç
è
 
å
j
 
 
1
nk2
K æ
ç
ç
è
Xi-Xj
 
 
h
ö
÷
÷
ø
K æ
ç
ç
è
Z
 
j
 
 
-Zi
k
ö
÷
÷
ø
/j (X
 
j
 
 
) ö
÷
÷
÷
ø
· n 2(Zi) ö
÷
÷
÷
ø

this time, we get
1
n6k2
1
h2
 
 
å
i,j,j
 
 
E [ C1 ] =Op(1)·
1
n4
s 4
 
å
i,l
n 2(Zi)=Op æ
ç
ç
è
1
n2
ö
÷
÷
ø

At the end, we finally have :
E [ F32 ] =Op æ
ç
ç
è
1
n3
+
1
n2
ö
÷
÷
ø
=Op æ
ç
ç
è
1
n2
ö
÷
÷
ø

and then
E é
ê
ê
ë
( nk.F3 )
2
 
 
ù
ú
ú
û
=Op ( k )

which is the announced result for F3 3


We get the final result by adding the three terms F1, F2 and F3 .

References

[1]
Aït-Sahalia, Y. (92) : ``The Delta and Bootstrap Methods for Nonlinear Functionals of Nonparametric Kernel Estimators Based on Dependent Multivariate Data'', Mimeo, Department of Economics, MIT.

[2]
Amemiya T. (80) : ``Selection of Regressors'', International Economic Review, 21(2), pp. 331-354.

[3]
Atkinson A.C. (70) : ``A Method for Discriminating Between Models'', Journal of the Royal Statistical Society, series B, no 32, pp. 323-344.

[4]
Bierens H.J. (87) : `` Kernel Estimators of Regression Functions '' in Advances in econometrics, Fifth World Congres, Vol.1, T.F. Bewley.

[5]
Bontemps C. (95) : `` Enveloppement dans les Modèles de Regression Paramétriques et Non-parametriques'', Ph.D Dissertation, University of Social Sciences, Toulouse.

[6]
Bontemps C., J.P. Florens and J.F. Richard (93) : ``Encompassing in Regression Models : Parametric and Nonparametric Procedures'', Mimeo, Gremaq, University of Social Sciences, Toulouse

[7]
Bosq D. et J.F. Lecoutre (87) : `` Théorie de l'estimation fonctionnelle'' Economica.

[8]
Brown B. M. (71) : ``Martingale Central Limit Theorems'', Annals of Mathematical Statistics, Vol 42, no 1, pp. 59-66.

[9]
Collomb G. (76) : `` Estimation Non-paramétrique de la Régression par la Méthode du Noyau'', Thèse, Université Paul Sabatier, Toulouse.

[10]
Collomb G. (77) : `` Estimation Non-paramétrique de la Régression par la Méthode du Noyau : Propriété de Convergence Asymptotiquement Normale Indépendante'', Annales Scientifiques de l'Université de Clermont, Vol. 15, pp.24-26.

[11]
Cox D.R.(61) : `` Tests of Separate Families of Hypotheses'' in Proceeding of the fourth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1,University of California Press, Berkeley, pp. 105-123.

[12]
Cox D.R.(62) : ``Further Results on Tests of Separate Families of Hypotheses'', Journal of the Royal Statistical Society, Series B, no. 24, pp. 406-424.

[13]
Dhaene G. (93) : `` Encompassing : Formulation, Properties and Testing '' Ph. D Dissertation Université Catholique de Louvain, Louvain-la-Neuve.

[14]
Engle R.F. , D.F.Hendry and J.F. Richard (83) : ``Exogeneity'', Econometrica, No. 55, pp 277-304.

[15]
Florens J.P., D.F. Hendry and J. F. Richard (94) : `` Encompassing and Specificity '' Mimeo, Gremaq, Université des Sciences Sociales de Toulouse.

[16]
Florens J.P., S. Larribeau et M. Mouchart (92) : `` Bayesian encompassing tests of a unit root hypothese.'' Mimeo, Gremaq, No 92.27.

[17]
Gourieroux C. et A. Monfort (91) : ``Testing Non Nested Hypotheses '', Mimeo No 9207, Crest-Cepremap-Insee.

[18]
Gourieroux C. et A. Monfort (92) : ``Testing Encompassing and Simulating Dynamic Econometric Models '', Mimeo No 9214, Crest-Cepremap-Insee.

[19]
Gourieroux C., A. Monfort et E. Renault (92) : ``Indirect Inference '' Mimeo, Crest-Cepremap, Insee, Gremaq.

[20]
Gourieroux C., A. Monfort et A. Trognon (83) : `` Testing Nested or Nonnested Hypotheses'' Journal of Econometrics, Vol.21. pp 83-115.

[21]
Govaerts B., D. Hendry and J.F. Richard (94) : ``Encompassing in Stationary Linear Dynamic Models'', Journals of Econometrics, Vol. 63, pp. 245-270.

[22]
Györfi L., W. Härdle, P. Sarda and P. Vieu (89) : ``Nonparametric Curve Estimation from Time Series'', Springer Verlag.

[23]
Hall P. (84) : `` Integrated Square Error Properties of Kernel Estimators of Regression Functions '', Annals of Statistics, Vol. 12, no 1, pp.241-260.

[24]
Härdle W. (90) : `` Applied Nonparametric Regression ''. Econometric society monographs, Cambridge University Press.

[25]
Härdle W., P. Hall and J.S. Marron (92) : ``Regression smoothing that are not far from their optimum'', Journal of the Royal Statistical Society, Series B, Vol.87, no. 417, pp. 227-233.

[26]
Härdle W. and E. Mammen (93) : ``Comparing nonparametric versus parametric regression fits'', Annals of Statistics, Vol. 21, no 4, pp. 1926-1947.

[27]
Härdle W. and J.S. Marron (89) : ``Optimal bandwidth selection in nonparametric procedure regression function estimation '', Annals of statistics, Vol.13, No.4.

[28]
Hausman J.A. (78) : `` Specification test in Econometrics'', Econometrica, vol 46, no 6.

[29]
Hendry D. (93) : `` The Roles of Economic Theory and Econometrics in Time Series Economics'' , Mimeo, Nuffield College, Oxford.

[30]
Hendry D. and J.F. Richard (89) : `` Recent development in the theory of encompassing '' in Contribution to operation research and economics, MIT Press.

[31]
Lu M. and G. E. Mizon (93) : `` The Encompassing Principle and Specification Test'', Mimeo, Economics Departement, Southampton University, UK.

[32]
Marron J. S. (88) : ``Automatic Smoothing Parameter Selection : A Survey'', Empirical Economics, Vol. 13, pp. 187-208.

[33]
Mizon G. E. (84): `` The Encompassing Approach in Econometrics '' In Econometrics and Quantitative Economics, edited by D. F. Hendry and K. F. Wallis. Ch. 6 Oxford : Basil & Blackwell

[34]
Mizon G. E. and J. F. Richard (86) : `` The encompassing principle and its application to testing non-nested hypotheses '' Econometrica, Vol.54, No 3.

[35]
Nadaraya E.A. (64) : ``On estimating regression'', Theory of Probability and its Applications, 9, pp. 141-142.

[36]
Pesaran M.H. (74) : ``On the General Problem of Model Selection'', Review of Economic Studies, no 41, pp. 153-171.

[37]
Sawa T. (78) : ``Information Criteria for Discriminating Among Alternative Regression Models'', Econometrica, no. 46, pp. 1273-92.

[38]
Serfling (80) : ``Approximation theorems '', Wiley series in probability and mathematical statistics, Wiley & Sons.

[39]
Stone, C. J. (82) : ``Optimal global rates of convergence for nonparametric regression'', Annals of Statistics, Vol 10, no 4, pp. 1040-1053.

[40]
Vieu P. (93) : ``Bandwidth selection for kernel regression'', Computational Statistics and Data Analysis, à paraître.

[41]
Watson, G.S. (64) : ``Smooth regression analysis'', Sankhya, Series A, 26, pp. 359-372.

[42]
White H. (80) : ``Using least squares to approximate unknown regression function '' International Economic Review, Vol.21(1).

[43]
White H. (82) : ``Maximum likelihood estimator of misspecified models '' Econometrica, Vol.50, pp. 1--26.

[44]
White H. (84) : ``Asymptotic Theory for Econometricians'', A Series of Monographs and Textbooks, Academic Press.

1
We are grateful to Jean-François Richard for his comments on previous work.
2
For ease of notation, j (· ) will be generically used to represent the joint density of Si, as well as its marginal or conditionnal densities, all ambiguities being resolved by the list of arguments.
3
Expectation relative to P0 are generically represented by the letter E.

This document was translated from LATEX by HEVEA.