Scholarly article on topic 'Remodeling and Estimation for Sparse Partially Linear Regression Models'

Remodeling and Estimation for Sparse Partially Linear Regression Models Academic research paper on "Mathematics"

CC BY
0
0
Share paper
Academic journal
Abstract and Applied Analysis
OECD Field of science
Keywords
{""}

Academic research paper on topic "Remodeling and Estimation for Sparse Partially Linear Regression Models"

Hindawi Publishing Corporation Abstract and Applied Analysis Volume 2013, Article ID 687151, 11 pages http://dx.doi.org/10.1155/2013/687151

Research Article

Remodeling and Estimation for Sparse Partially Linear Regression Models

Yunhui Zeng,1,2 Xiuli Wang,3 and Lu Lin1

1 Shandong University Qilu Securities Institute for Financial Studies and School of Mathematical Science, Shandong University, Jinan 250100, China

2 Supercomputing Center, Shandong Computer Science Center, Jinan 250014, China College of Mathematics Science, Shandong Normal University, Jinan 250014, China

Correspondence should be addressed to Lu Lin; linlu@sdu.edu.cn

Received 11 October 2012; Accepted 14 December 2012

Academic Editor: Xiaodi Li

Copyright © 2013 Yunhui Zeng et al. "ttis is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

When the dimension of covariates in the regression model is high, one usually uses a submodel as a working model that contains significant variables. But it may be highly biased and the resulting estimator of the parameter of interest may be very poor when the coefficients of removed variables are not exactly zero. In this paper, based on the selected submodel, we introduce a two-stage remodeling method to get the consistent estimator for the parameter of interest. More precisely, in the first stage, by a multistep adjustment, we reconstruct an unbiased model based on the correlation information between the covariates; in the second stage, we further reduce the adjusted model by a semiparametric variable selection method and get a new estimator of the parameter of interest simultaneously. Its convergence rate and asymptotic normality are also obtained. "tte simulation results further illustrate that the new estimator outperforms those obtained by the submodel and the full model in the sense of mean square errors of point estimation and mean square prediction errors of model prediction.

1. Introduction

Consider the following partially linear regression model:

Y = pTX + yTZ +f(U) + £, (1)

where Y is a scalar response variable, X and Z are, respectively, p-dimensional and q-dimensional continuous-valued covariates with p being finite and p < q, p is the parameter vector of interest and y is the nuisance parameter vector which is supposed to be sparse in the sense that ||y||2 is small, f(-) is an unknown function satisfying Ef(U) = 0 for identification, £ is the random error satisfying E(e \ X, Z, U) = 0. For simplicity, we assume that U is univariate. Let (Yi,Xi,Zi,Ui), i = \,...,n, be i.i.d. observations of (Y, X, Z, U) obtained from the above model.

A feature of the model is that the parametric part contains both the parameter vector of interest and nuisance parameter vector. tte reason for this coefficient separation is as follows. In practice we often use such a model to distinguish the main treatment variables of interest from the state variables. For

instance, in a clinical trial, consists of treatment variables and can be easily controlled, is a vector of many clinical variables, such as patient ages and body weights. tte variables in may have an impact on but are not of primary interest and the effects may be small. In order to make up for potentially nonnegligible effects on the response , the nuisance covariate Z are introduced into model (1); see Shen et al. [1]. Mo del (1) contains all relevant covariates and in this paper we call it full model.

tte purpose of this paper is to estimate /}, the parameter of interest, when yTZ is removed from the model. tte main idea is remodeling based on the following working model:

Y = fx + f(U) + r,. (2)

As is known, E(rj \ X = x,U = u) is a nonzero function if y1 E(Z \ X,U)± 0, which relies on two elements, one is E(Z \ X,U), related with the correlation between the covariates of Z and (X,U), the other is y, determined by the nuisance parameter in the removed part. ttus the least

squares estimator based on model (2) may be inconsistent. In the following, we will make use of the above two elements. Specifically, in the first stage, we shall construct a remodeled model by a multistep-adjustment to correct the submodel bias based on the correlation information between the covariates. ttis adjustment is motivated by Gai et al. [2]. In the paper, they proposed a nonparametric adjustment by adding a univariate nonparametric estimation to the working model (2), and it can dramatically reduce the bias of the working model. But this only holds in a subset of the covariates, although the subset may be fairly large. In order to obtain a globally unbiased working model for linear regression model, Zeng et al. [3] adjusted the working model by multiple steps. Because only those variables in Z correlated with (X, U) may have impact on estimation of /}, in each step a univariate nonparametric part was added to the working model and consequently a globally unbiased working model was obtained.

However, when many components of Z are correlated with ( , the number of nonparametric functions added in the above working model is large. Such a model is improper in practice. ttus, in the second stage, we further simplify the above adjusted model by a semiparametric variable selection procedure proposed by Zhao and Xue [4]. tteir method can select significant parametric and nonparametric components simultaneously under sparsity condition for semiparametric varying coefficient partially linear models. tte relevant papers include Fan and Li [5], Wang et al. [6, 7], among others. After two-stage remodeling, the final model is conditionally unbiased. Based on this model, the estimation and model prediction are significantly improved.

tte rest of this paper is organized as follows. In Section

2, a multistep adjustment and remodeled models are firstly proposed, then the models are further simplified via the semiparametric SCAD variable selection procedure. A new estimator of the parameter of interest based on the simplified model is derived, its convergence rate and asymptotic normality are also obtained. Simulations are given in Section

3. A short conclusion and some remarks are contained in Section 4. Some regular conditions and theoretical proofs are presented in the appendix.

2. New Estimator for the Parameter of Interest

In this paper, we suppose that covariate has zero mean, is finite and p << q, E(e \ X,Z, U) = 0 and Var(e | X ,Z,U) = o2. We also assume that covariates X and U and parameter ¡3 are prespecified, so that the submodel (2) is a fixed model.

2.1. Multistep-Adjustment by Correlation. In this subsection, we first adjust the submodel to be conditionally unbiased by a multistep-adjustment.

When Z is normally distributed, the principal component analysis (PCA) method will be used. Let hz be the covariance matrix of Z, then there exists an orthogonal q x q matrix Q such that QZzQT = A, where A is the diagonal matrix diag(X-1,X2,..., Xq) with X1 > X2 > ••■ > Xq > 0 being eigenvalues of Zz. Denote Qf = (t1?t2, ...,rq) and Z^ = r^Z.

When is centered but nonnormally distributed, we shall apply independent component analysis (ICA) method. Assume that is generated by a nonlinear combination of

independent components Z ' , that is Z = F(Z), where F(■ ) is an unknown nonlinear mapping from Rq to Rq, Z is an unknown random vector with independent components. By imposing some constraints on the nonlinear mixing mapping

F or the independent components Z(\ the independent

components Z(' can be properly estimated. See Simas Filho and Seixas [8] for an overview of the main statistical principles and some algorithms for estimating the independent components. For simplicity, in this paper we suppose that

Z = (Z( ...,Z(q))T with Z(l) = Zl=1Flj(Z(fl) ,1 = 1,...,q, and Flj(■ ) are scalar functions.

In the above two cases, Z^'s are independent of each

other. Set K0 to be the size of set M0 = {j : E(Z^ | X ,U)±0, 1 < j < q}. Without loss of generality, let M0 = {1,...,K0}.

We construct the following adjusted model:

y = pTx + fgj(z{fi) + f(u) + <Ko, (3) i=i

where g(Z{j)) = E(Y-fx - f(U) | Z(i)) = yTE(Z | Z(fl),

j =1..K0 andCKo ~pTX-gi(ZW)-••■-g^Z^)-f (U). tte model (3) is based on Z's population and depends on the distributions of X, U and Z. It is easy to see that model

(3) is conditionally unbiased, that is, E((K | X,U,Z(\ 1 < j < K0) = 0.

tte adjusted model (3) is an additive partially linear model, in which fix is the parametric part, ( and gj(Z()) , j = 1, ...,K0, are the nonparametric parts and (Ko is the random error. Compared with the submodel (2),

the nonparametric parts gj(Z(') , j = 1, ..., K0, may be regarded as bias-corrected terms for the random error . For centered Z, E(g(Z^)) = 0, j = 1,...,K0, the nonparametric components g1(Z(1),..., gK<(Z<(Ki'can be properly identified. In fact, centered can be relaxed to any such that satisfies yTE(Z) = 0.

When is centered and normally distributed, the non-

parametric parts gj(Z(^) = a^Z = a.jZ(\ j = 1, ...,K0. So the multistep adjusted model (3) is really a partially linear model

Y = ßTX + aTZK+ f(U) + (Kn

with a = (al, ...,aK )T and ZK = (Z1 ,...,ZK° ) . Specially, when f(U) = 0, the full model is a linear model, the multistep adjusted model is also a linear model

Y = ßTX + aTZK+(K.

But when the variables in Z are not jointly normal, the nonparametric parts g^ can be highly nonlinear, which are similar to the results of marginal regression; see Fan et al. [9].

2.2. Model Simplification. When the most of the features in the full model are correlated, then K0 is very large and even is close to q. In this case, the adjusted model (3) is improper in practice, so we shall use the group SCAD regression procedure, proposed by Wang et al. [6], and the semiparametric variable selection procedure, proposed by Zhao and Xue [4], to further simplify the model.

~ (i) 2

Let s = with ,Mt = {1<j<K0: Eig^Z1')) > 0}, and assume that the model (3) is sparse, that is, s is small. We define the semiparametric penalized least squares as

? (ß> g(-)J (■)) =1 \ Yi - ßTx{ - f9j (z^') - f (u,)

+nZpi,{hj(z

Denote

= (d'd:)1'2, invoking that E(^:(Zyj')

T ~(j)

■W: (Z )) = IL the identity matrix, we get

F (ß, d, v) = - ßTXi - dT% - vT¥0i}2

«IPUPD'

where d = (di,...,dx0) , % = Y(Zi) = Vec^^Z)'),

Denote by ]}, 0 = (d^,...^) and v the least squares estimators based on the penalized function (10), that is (fi,8,v) = argminp€Rpid.€RLiV€RLF(p,d,v). Let"g} =

-gf(Z(j)) = oJvdZ0) and f = f(U) = vT%o(U), then % is

■J^ , -J - , — J - J , ■

an estimator of g^Z^), f is an estimator of f(U).

Let~Mn = {1 < j < K0 : dj±0} and Kn = \Mn\.

For simplicity, we assume that Mt, = {1,2,... ,s} and Mn = {1,2,... ,Kn}. So we get the following simplified working model

where \\gj(Z{i))\\ = (E(gj(Zij)))2)1/2, and px(-) is the SCAD penalty function with A being a tuning parameter defined as

Y = ßTX+^gj(z{fi) + f(U) + (Kn, (11) M

Px (w) = X-

I{w<X) +

(aX-w)+

with a > 2, w > 0 and px(0) = 0. In (6), g(-) denotes the set {gj(Zii)), j = 1,...,K0}. Because gj are nonparametric functions, thus they cannot be directly applied for minimization. Here we will replace /(■) and g(-) by basis function approximations. For 1 < j < K0, let k = 1,... ,L} be orthogonal basis functions satisfying

Jsupp v ' v ' v '

= Su, =

0, k±l, 1, k = l,

where rj(Z ) is the density function of Z . Similarly, let {¥0jt, k = 1,...,L} be orthogonal basis functions satisfying the above condition which is only replaced by

the support and density function of U. Denote Wj(Z^) =

(V^Z®),...,*^)?, Wo(U) = (Woi(U),...,WOL(U))T. tten giZ^) and f(U) can be approximated by

9j (z(i)) « (zU) ), f (U) ~ vTVo (U). (9)

where g:(Z(j)) = E(yTZ | Z0)), j = 1,...,Kn and

(Kn = Y - fx - g! (Za))-----gK{Z- f(U).

Under the assumption of sparsity, the model (11) contains all of significant nonparametric functions and fully utilizes both the correlation information of covariates and the model sparsity on nuisance covariate.

If is centered and normally distributed with covariance matrix hz = Iq the identity matrix, then Tj = e^, j = 1, ..., q, where ej denotes the unit vector with 1 at position

j, and a is sparse with a.j = yTTj = y j. So the model (4) is sparse. For model (5), the special case of model (4), we can apply the SCAD penalty method proposed by Fan and Li [5] to select variables in ZK and estimate parameters a and ¡3 simultaneously. tte selected covariate and the corresponding parameter are denoted by ZK and aK , the

resulting parameter estimators are denoted by"i% and /}, respectively. Finally, we can use the simplified model

Y = ß TX + aTK ZK + ^

for model prediction. Under the condition of sparsity, its model size is much smaller than those of the multistep adjusted model (5) and the full model (1).

2.3. Asymptotic Property of Point Estimator. Let ¡30, d0, v0, and gp(■) , f 0(•) be the true values of /}, d, v, and g^( • ), f(-), respectively, in model (3). Without loss of generality, we

assume that gj0(Z(^) = 0, j = s + 1,... ,K0, and gjQ(Z^), j = 1,..., s, are all nonzero components.

We suppose that gj(Z^), j = l,...,K0 can be

expressed as Ykk=i Ojk^jkiZ^) and f(U) can be expressed as Y,k=i vk^ok(U), dj and v belong to the Sobolev ellipsoid S(r,M) = {0 : Zck= 62kk2r <M, M >0,r>0}.

tte following theorem gives the consistency of the penalized SCAD estimators.

^eorem 1. Suppose that the regularity conditions (C1)-(C5) in the appendix hold and the number of terms L = Op(n1K2r+1)). Uen,

(i) 0-po\\ = Op(nr/(2r+1) + an),

(ii) WÏÏji-) - gj0(-)\\ = Op(nr/(2r+i) + an), j=l,...,K0,

(iii) \\f(-)-f0(-)\\=Op(nrK2r+1)+an),

where an = maxj{lpx,(\\dj0\2)l : djo*0}.

From the last paragraph of Section 2.2 we know that, for linear regression model and normally distributed Z, the multistep adjusted model (5) is a linear model. By orthogonal basis functions, such as power series, we have r = oo, then

— U —

\\p ~ Po\\ = Op(n ' + an), implying the estimator /} has the same convergence rate as that of the SCAD estimator in Fan and Li [5].

^eorem 2. Suppose that the regularity conditions (C1)-(C6) in the appendix hold and the number of terms Op(n1/(2r+1]). Let Xmax = maxj{Xj} and Xmin = minj{Xj}. If Xmax ^ 0 and nr/^2r+1^Xmin ^ oo as n ^ oo, then, with

max mm ' '

probability tending to 1, "g.(-) = Q, j = s + ..., K0.

Remark 3. By Remark 1 of Fan and Li [5], we have that, if 0 as n —> oo, then an —>■ 0. Hence from

tteorems 1 and 2, by choosing proper tuning parameters, the variable selection method is consistent and the estimators of nonparametric components achieve the optimal convergence rate as if the subset of true zero coefficients was already known; see Stone [10].

Let d* = (dl,..., be the nonzero components of d, corresponding covariates are denoted by V*, i = 1,...,n. In addition, let

X= -^-{E (XXT) - E (X¥*T) E~l (v*^*1"} E (V*XT)

-E(XWt0)E-1(V'0Wt0)E(V'0Xt)},

where o2K = Var((Koi) for homoscedastic case, = -E(W0W*T')E~1(W*W*T)W*.

^eorem 4. Suppose that the regularity conditions (C1)-(C6) in the appendix hold and the number of terms L = Op(n1/i~2r+1). If Z is invertible, then

^-^—N^X1), (14)

where "—>" denotes the convergence in distribution.

Remark 5. From tteorems 1 and 4, it can be found that the penalized estimators have the oracle property. Furthermore, the estimator of the parameter of interest has the same asymptotic distribution as that based on the correct submodel.

2.4. Some Issues on Implementation. In the adjusted model (4), Tj, j = 1,...,K0 are used. When the population distribution is not available, they need to be approximated by estimators. When Z is normally distributed and eigenvalues Tj, j = 1,...,q of the covariance matrix Zz are different from each other, then ■n(Uj - Tj) is asymptotically N(0, Vj) with Vj = Y¡+j(rjri/(rj~ ri)2)rir1, where Uj is the jth eigenvector of S = (1 /(n - 1 ) )Y"= 1(Zi - Z)(Zi - Z)T with Z = (1/n) Y1=1Zi; see Anderson [11]. For the case when the population size is large and comparable with the sample size, if the covariance matrix is sparse, we can use the method in Rütimann and Bühlmann [12] or Cai and Liu [13] to estimate the covariance matrix. So we can use Uj to approximate Tj. When Tj in model (4) are replaced by these consistent estimators, one can see that the approximation error can be neglected without changing the asymptotic property.

tte nonparametric parts g^Z^) in the adjusted model

depend on the univariate variable Z('', for I = 1,..., K0. So it needs to choose the steps K0 firstly. In real implementation, we compute all the q multiple correlation coefficients of

Z( (I = 1,...,q) with X and U. tten we choose the

components R = {Z®: lmcorr(Z(\(X,U))l > S, I = 1,... ,q} for given small number S > 0, where mcorr(w,V) denotes the multicorrelation coefficient between u and V and can be approximated by its sample form; see Anderson [11].

ttere are some tuning parameters needing to choose in order to implement the two-stage remodeling procedure. Fan and Li [5] showed that the SCAD penalty with a = 3.7 performs well in a variety of situations. Hence, we use their suggestion throughout this paper. We still need to choose the positive integer L for basis functions and the tuning parameter Xj of the penalty functions. Similar to the adaptive

lasso of Zou [14], we suggest taking Xj = X/Wd'j '||2, where

d'j0 is initial estimator of dj by using ordinary least squares method based on the first term in (10). So the two remaining parameters L and X can be selected simultaneously using the leave-one-out CV or GCV method; see Zhao and Xue [4] for more details.

3. Simulation Studies

In this section, we investigate the behavior of the newly proposed method by simulation studies.

3.1. Linear Model with Normally Distributed Covariates. tte dimensions of the full model (1) and the submodel (2) are chosen to be 100 and 5, respectively. We set /3 =

(0.5,3.5,2.5,1.5,4.0)T and y = (ypy2 , 055)T, where y2 ~ Unif[-0.5,0.5]30, a 30-dimensional uniform distribution on [-0.5,0.5]30, and y1 is chosen in the following ways:

Case (I). y1 ~ Unif[0.5,1.0]10.

Case (II). y1 = (1.0,1.0,1.0,1.5,1.5,1.5,2.0,2.0,2.0,2.0). We assume that (XT, ZT)T ~ N((l3, 0[0,135)T, ZZT), where

= °>' =

1.0, j = z, i = 1, ... , p + q; c, j = i + p, г = 1,3, ... , q; 0, otherwise,

with c = 0.5 or c = 0.8. tte error term £ is assumed to be normally distributed as N(0,0.32).

Here we denote the submodel (2) as model (I), the multistep adjusted linear model (5) as model (II), the two-stage model (12) as model (III), and the full model (1) as model (IV). We compare mean square errors (MSEs) of the new two-stage estimator /3TS based on model (III) with the estimator /3S based on model (I), the multistep estimator /3M based on model (II), the SCAD estimator /3SCAD and the least squares estimator /3F based on model (IV). We also compare mean square prediction errors (MSPEs) of the above mentioned models with corresponding estimators.

tte data are simulated from the full model (1) with sample size n = 100 and simulation times m = 1000. We use the sample-based PCA approximations to substitute r^-'s. tte parameter a in the SCAD penalty function is set to be 3.7 and A is selected by leave-one-out CV method.

Table 1 reports the MSEs of point estimators on the parameter ^ and the MSPEs of model predictions. From the table, we have the following findings: (1) /3F has the largest MSEs and jSs takes the second place, nearly all the new estimator /^tS has the smallest MSEs. (2) When c = 0.5, the MSEs of j3SCAD are smaller than those of /3M, while when c = 0.8 they are larger than those of /3M. ttese show that if the correlation between the covariates is strong, the MSEs of /3SCAD are larger than those of /3M, the multistep-adjustment is necessary, so the estimations and model predictions based on two-stage model are significantly improved. (3) In case (I) and (II) the simulation results have the similar performance. (4) Similar to the trend of the MSEs of the five estimators, the MSPE of the two-stage adjusted model is the smallest among the mentioned five models.

In summary, Table 1 indicates that the two-stage adjusted linear model (12) performs much better than the full model, and better than the submodel, the SCAD-penalized model and the multistep adjusted model.

3.2. Partially Linear Model with Nonnormally Distributed Covariates. tte dimensions of the linear part in the full model (1) and the submodel (2) are chosen to be 50 and 5, respectively. We set ^ = (0.5,3.5,2.5,1.5,4.0)T, y =

(ypy2,025) , /(M) = w2 * sin(3w), where

yx = (0.5,0.1,0.8,0.2,0.5,0.2,0.6,0.5,0.1,0.9),

y2 ~ Unif[-0.3,0.3]10, a 10-dimensional uniform distribution on [-0.3,0.3].

We assume that the covariates are distributed in the following two ways.

Case (I). (X\Z\L0 ~ f^ZZ1),

a 51-dimensional student distribution with degree of freedom df = 5, where

£ = K),

1.0, 0.95, 0.9, 0,

j = г, i = 1,...,p + q + 1; j = i + p, i = 1,2,..., q + 1; j = i + p - 2, i = 1,2,... , q + 3; otherwise.

Case (II). X = (1/(1 + c))^! + cV), Z = (Z[,Z2,Z3,Z[)T with Zi = (1/(1 + c))(W2 + cV), Z2 = W3, Z3 = (1/(1 + c))(W4 + cV), Z4 = W5, U = W5;1), where Wj, W2, W3, W4 ~ Unif[-1.0,1.0]5, W5 ~ Unif[-1.0,1.0]30, V ~ Unif[-1.0,1.0]5, uniform distributions on [-1.0,1.0] and constant с = 0.1. All Wj, W2, W3, W4, W5, and У are independent.

tte error term £ is assumed to be normally distributed as N(0,0.32).

Here we denote the submodel (2) as model (I) , the multistep adjusted additive partially linear model (3) as model (II) , the two-stage model (11) as model (III) and the full model (1) as model (IV) . We compare mean square errors (MSEs) of the new two-stage estimator /3TS based on model (III) with the estimator jSs based on model (I) , the estimator jSM based on model (II) and the least squares

estimator jSF based on model (IV) . We also compare the mean average square errors (MASEs) of the nonparametric estimators of /(•) and the mean square prediction errors (MSPEs) of different models with corresponding estimators.

tte data are simulated from the full model (1) with sample size n = 100 and simulation times m = 500. We use the sample-based approximations of ICA, see Hyvarinen and Oja [15]. tte parameter a in the SCAD penalty function is set to be 3.7, the number L and the parameter A is selected by GCV method. We use the standard Fourier orthogonal basis as the basis functions.

Table 2 reports the MSEs of point estimators on the parameter the MASEs of /(•) and the MSPEs of model predictions. From the table, we have the following results: (1) jSF has the largest MSEs, its MSEs are much larger than the MSEs of the other estimators, and the new estimator always has the smallest MSEs. (2) tte MASEs of /(•) have

Table 1: MSEs on the parameter ^ and MSPEs of the two-stage adjusted linear model (12) compared with the submodel, the SCAD-penalized model, the multistep adjusted model and the full model.

No. Item ßsCAD Âm ß TS &

0.3079 0.0457 0.0660 0.0571 1.6105 x 103

0.1763 0.0206 0.0346 0.0176 1.0940 x 103

Case (I) MSEs 0.1396 0.0481 0.0631 0.0461 4.2049 x 103

c = 0.5 0.1870 0.0196 0.0349 0.0186 5.0183 x 103

0.1131 0.0517 0.0609 0.0430 6.2615 x 103

MSPEs 3.4780 1.1896 1.6512 1.0679 3.0499 x 102

0.1568 0.6191 0.0934 0.0826 1.2494 x 103

0.6239 0.1060 0.0090 0.0083 1.0456 x 102

Case (I) MSEs 0.8829 0.8173 0.0895 0.1039 2.6368 x 102

c = 0.8 0.5882 0.0919 0.0107 0.0100 7.6452 x 101

1.0799 0.9829 0.0961 0.0929 1.1610 x 103

MSPEs 4.7930 2.6700 0.8354 0.7771 1.3223 x 102

0.4272 0.0660 0.0849 0.0557 4.3002 x 102

0.6371 0.0318 0.0499 0.0295 3.7893 x 103

Case (II) MSEs 0.4560 0.0715 0.0927 0.0588 1.2784 x 103

c = 0.5 0.5926 0.0306 0.0491 0.0287 6.7354 x 103

0.9052 0.0734 0.0874 0.0583 2.5047 x 102

MSPEs 6.8634 1.5096 2.0780 1.2077 5.0464 x 103

0.6764 0.4263 0.1212 0.0960 1.3904 x 103

0.9721 0.1060 0.0107 0.0102 4.0743 x 102

Case (II) MSEs 0.6242 0.4756 0.1146 0.1003 1.0498 x 103

c = 0.8 1.0282 0.0954 0.0112 0.0098 5.6031 x 102

1.3420 0.5474 0.1341 0.1124 9.9632 x 102

MSPEs 7.9928 2.1165 0.9514 0.8469 2.3110 x 102

Table 2: MSEs on the parameter MASEs of /(•) and MSPEs of the two-stage adjusted model (11) compared with the submodel, multistep adjusted model and the full model.

No. Item & Âm ß TS &

0.4352 5.0403 0.3267 2.9753 x 101

0.6859 1.2820 x 101 0.3328 1.4593 x 101

MSEs 1.1152 8.1542 0.3723 1.4391 x 101

Case (I) 1.8489 7.2055 1.3194 2.4036 x 101

3.3079 1.6144 x 101 1.9989 4.8575 x 101

MASEs 3.0887 5.9814 3.0175 3.0633

MSPEs 4.6047 7.0331 x 101 3.5536 3.9648

0.0377 0.6144 0.0191 1

0.0449 1.0876 0.0305 —

MSEs 0.0332 3.7510 0.0246 —

Case (II) 0.0396 0.4324 0.0238 —

0.0512 1.1995 0.0335 —

MASEs 0.4722 0.5220 0.4126 0.4380

MSPEs 0.9221 9.3068 0.8053 —

i "—" denotes the algorithm collapsed and returned no value.

similar trend to the MSEs of the four estimators, while the differences are not very noticeable. (3) Similar to the MSEs of the estimators, the MSPEs of the two-stage adjusted model are the smallest among the four models. (4) In Case (II), the simulation results of models (I) , (II) and (III) , perform a

little better than those in Case (I) because of the correlation structure among the covariates.

In summary, Table 2 indicates that the two-stage adjusted model (11) performs much better than the full model and the multistep adjusted model, and better than the submodel.

4. Some Remarks

In this paper, the main objective is to consistently estimate the parameter of interest p. When estimating the parameter of interest, its bias is mainly determined by the relevant variables, and its variance maybe impacted by other variables. Because variable selection much relies on the sparsity of the parameter, when we directly consider the partially linear model, some irrelevant variables with nonzero coefficients may be selected in the final model. ttis may affect the estimation of the parameter p on its efficiency and stability. ttus based on the prespecified submodel, a two-stage remodeling method is proposed. In the new remodeling procedure, the correlation among the covariates (X, Z) and the sparsity of the regression structure are fully used. So the final model is sufficiently simplified and conditionally unbiased. Based on the simplified model, the estimation and model prediction are significantly improved. Generally, after the first stage the adjusted model is an additive partially linear model. tterefore, the remodeling method can be applied to partially linear regression model with linear regression model as a special case.

From the remodeling procedure, we can see that it can be directly applied to additive partially linear model, in which the nonparametric function f(U) has componentwise additive form. As for general partially linear model with multivariate nonparametric function, we should resort to multivariate nonparametric estimation method. If the dimension of covariate U is high, it may be faced with "the curse of dimensionality".

In the procedure of model simplification, orthogonal series estimation method is used. ttis is only for technical convenience, because the semiparametric penalized least squares (6) can be easily transformed into parametric penalized least squares (10) and then the theoretic results are obtained. Although other nonparametric methods such as kernel and spline can be used without any essential difficulty, they can not directly achieve this goal. Compared with kernel method, it is somewhat difficult for series method to establish the asymptotic normality result for the nonparametric component f(U) under primitive conditions.

Appendix

A. Some Conditions and Proofs

A.1. Regularity Conditions (C1)-(C6).

(C1) (Z,U) has finite nondegenerate compact support, denoted as supp(Z, U).

(C2) tte density function rAt) of Z(j) and r0 (t) of U satisfies 0< L1 < rj(t) < L2 < m on its support for 0 < j < K0 for some constants L1 and L2, and it is continuously differentiable.

(C3) G(Z,U) = E(XXT \ Z,U) and E((2Kq \ Z,U) are continuous. For given Z and u, G(Z,u) is positive definite, and its eigenvalues are bounded.

(C4) sup^upp^EiWxf \ Z = Z,U = u) < «,, Ef(U) = 0, the first two derivatives of f(-) are Lipschitz continuous of order one.

(C5) bn = mzx^pliWpWJ : djo*0} ^ 0 as n ^ rn.

(C6) liminfn^mliminflld^0X-j1p'Xj(\\dj0\\2) > 0for j = s + 1,... ,K0 where s satisfies yTE(E(Z \ Z^)E(ZT \ Z(i)))y >0 for 1 < j < s; yTE(E(Z \ Z(i))E(ZT \ Z(j)))y = 0 for s < )< K0.

Conditions (C1)-(C3) are some regular constraints on the covariates and condition (C4) is some constraints on the regression structure as those in Hardle et al. [16]. Conditions (C5)-(C6) are assumptions on the penalty function which are similar to those used in Fan and Li [5] and Wang et al. [7].

A.2. Proof for Ieorem 1. Let S = rTr^2r+V> +an, ft = ft0 + ST1,

e = e0 + ST2, v = v0 + ST3 and T = TT2, tJ)T . Firstly, we shall prove that, Me > 0, 3C > 0, P{rnfm=cF(p,d,v) > F(p0,d0,v0)}>1-e.

Denote D(fi,d,v) = L-1{F(^,d,v) - F(^0,d0,v0)}, then we have

D(p,d,v)

= j! \ №Xi + + (UJ) Li=1

x (-2SYi) + 25 (fàXi + eT0V (z{) + vT0V0 (Ui)) x(TT1Xi + TT2V(Zi)+TT3V0(Ui)) + s2(TT1xi + Tl!w('zi) +TT3V0 (U,.))2]

j=1 ^ i=1

x(TT1Xi + TT2V(Zi)+TT3V0(Ui))

+ TY(tTIX< + +

= h + h+ h>

where R(Zi,Ui) = lJ=1Rj(Zi) + R0(Ui) with R^ZJ = 9j(zf) - tfjVj&h j=l,...,K0 and R0(Ui) = f(Ui) -vTW0(Ui).

By the conditions (C1) and (C2), the maximal squared bias of g^Z^) is equal to

= 192^ iMj) *ML-2r, (A.2)

k=L+l k=L+l

so \\Rj(Z(i))\\ = 0(L-r). Similarly, \\R0(U)\\ = 0(L-r). tten, f> (Z{, U{) (TT1Xi + TT2V (Z{) + TT3V0 (UJ)

= 0p{nK0L-r BTB).

Noticing that E((Ko | X, Z, U) = 0, by Zhao and Xue [4], we have

-^rl^i (TTlXi + TT2V (zi) + TT3Vo (Ud) = op (\\T\\).

h = - Y [°p (nKoL-r + °P -\\T\\)]

= Op(l+nr«2r+1)an)\\T\\.

nr/(2r+l)

Similarly, we have 0<l2 = Op(nL-1S2\\T\\2)

= Op(l + 2nrK2r+1)an + n2r/(2r+1)a2n) \\T\\2.

By properly choosing a sufficiently large C, I2 dominates I1 uniformly in ||T|| = C.

Using Taylor expansion,

h = llP^9A\mo\\2)ST2> j=lL '

x(l+o(l))

- hi + 132.

By simple calculations, we have that

IhA^jSaJ^l\T2j\I < -SjSaJi \\T\\ L j=i L

= Op(nr/(-2r+1)an + n2r/(-2r+1)a2n) \\T\\,

M < ni82кhY^т2j\1 < ^„hmt

where and l2 are some positive constants. We can find that I31 is also dominated by I2 uniformly in ||T|| = C, and under the condition (C5), we have

0 < \h2\ <op(l + 2nrj{2r+l)an + n2rK2r+1)a2n) UTf. (A.9)

Hence, by choosing a sufficiently large C, P{miiin=cF(ß,d,v) > F(ß0,d0,v0)} > 1 - e, which implies that with probability at least 1 - e there exists a local minimum of F(ß,d,v) in the ball {ß0 + ST1 ■ HTJ < C}. Denote the local minimizer as ß, then

\\ß - ß0\\ = Op (8) = Op (nr/(2r+1) + an) . (A.10)

With the same argument as above, there exists a local minimum in the ball {0o + ST2 ■ ||T2|| < C}, and the local minimizer d satisfies that

\d-do\=Op(nrl(2+1Uan). (A.11)

For the nonparametric component g(-), noticing that

№rvA2 = E{Mz(J))-vA2U))}2

<2E{vj(z(j))dj-Vj(z(j))djo}2 + 2E{Rjo{z^}2

= 2(8; - djo)7 (dj - djo) + 2E{Rfl (z{i))}\

(A.12)

(A.6) it is known that \\Rj(ZKJ))\\ = 0(L r), so

2r/(2r+l)

ttus, we get

№reA = °P(.«r/(2r+1)

(A.13)

(A.14)

+ an).

Similarly, there exists a local minimizer v satisfies that

\v - Vo\ = Op(nr/(2r+1] + an). tten we can get \\f - fo\\ =

OJnr/^+an).

A.3. Proof for Ieorem 2. When Xmdx —> 0, an = 0 for large n by the form of p^(w). tten by tteorem 1, it is sufficient to show that: with probability tending to 1 as n —> oo, for any P, it satisfies Hp-M = Op(n-r/(2r+1)), Q} satisfies ||0;--0;J = Op(n-r^2r+1^) with j = 1,... ,s, and v satisfies ||v - v0| = Op(n-r^2r+1^), for some small in = Cn-r^2r+1^,

dF{ß,9,v)

> o, for o<9: < in, j = s + !,..., Ko,

dF{ß,e,v)

< o, for - in<Qj <o, j = s+l,...,Ko.

(A.15)

So the minimizer of 0 , v) is obtained at 0^- = 0, j = s + In fact,

-25>;. (2f) (y,- - 0TX,. - 0T¥ (Z,.) - vT^o (U,.))

^.(KLHNU' 1=1 =1

-2^(zP)¥T(Z,.)(0o-0 ) =1

^.(NLHNU'

(A.16)

Under the conditions liminf„ o liming ^0A-1pA

. (||0||2) = C > 0 and A/7 (2r+1) > Aminnr/ (2>r+1) ^ o, then v)/30;- = Op(mA;-(0;-/||0;-||2)). So the sign of the

derivative is determined by 0^.

So with probability tending to 1, dj = 0, j = s + 1,..., K0.

tten under supz|¥j(Z(j))| = 0(1),^(Z00) = ^(Z0') ^ 0, j = s+1,...,K0.

A.4. Proof for Ieorem 4. By tteorems 1 and 2, we know that, as , with probability tending to 1,

— —*T T

attains the local minimum value at /3 and ( 0 ,0 ) and v. Let Fln( /3 ,0,v ) = 9F ( /3 ,0 ,v ) / 3ft ^ ( ,0.v ) = 9F ( /3 ,0,v ) / 90* and F3„ ( jS , 0, v ) = 9F ( ,0, v ) / 9v, then

= -Xx'' - fiXi - - 9T^0i) = 0,

(A.17)

-*T \T » ,0]

-¿y* (y. - - 0*\* - vX)

(A.18)

- - e'V - vX)=0.

From (A.17), it yields that ^•((^o-^ + ^o-ê*)1

(A.19)

(A.20)

+(V0-V)T^0, +^* (Zi,U,.) + CK0,) = 0,

where = + ^(LT,-). Applying the

Taylor expansion, we get

+ o„(1)

....."2/" p^0|l2m' p

(0*-0o*). (A.21)

Furthermore, condition (C5) implies that pA.(||0jo||2) = Op(1), and noting that pA.(||0jo||2) = 0 as Amax ^ 0, then pl/llÂ/U = - 0O). So from (A.18), it yields

+(vo - v)T^o, + U,.) + Cko,) (A.22)

+ op(0oo-0*)=O.

x Let On = «-1 r„ = «-1 ZP=i ^ and n„ =

m Zi=1 then we have

0*-0o* = [0„ + op(1)]-1

r„(^o-^) + n„(vo-v) (A23)

xifX^Z,.)^)

Substituting (A.23) into (A.20), it yields

{(^0- - (/3 - ft)

+(x,.-r^o„1^;)T(v-vo)} + op(^-^0)+op(v-v0) (A.24)

-^;t(o-1 + op(i))a„},

where A„ = n-1 ^ ^(^(Z,.,^.) + f*oi). From (A.19), it yields that

v (A.25)

+(vo-v)T^o, +^0 (Zi,U,.) + CK0,) = 0. Substituting (A.23) into (A.25), it yields

- r®-1¥0)T (/) - ^o)

"¡=1 L

+(^o,-n^O-1^0)T(v-Vo)}

= ¿j^^oi {C^oi + ^ (^i^) - ^0T (®-1 + Op (1)) A„} .

(A.26)

Noting that

M-1^®-1^ {<• - ^0T®-1n„} = o,

M-1£n^O-1^0 {Cko,- + £0 (^u,-) - ^0T^-1A„} = o,

(A.27)

Equation (A.26) can be rewritten as

ifVo^r^-AO + Op^-A,)

+ Op(v- Vo)

= -¿^{iKoi + ^ - ^0T (®-1 + Op (1)) A„} ,

(A.28)

where X,. = Xt- r^O-1^0, *o,= ^oi - n®-1^. Let S„ = n-1 ¥oi¥oi, then we have

+ H„1-]r^'o,.(cKoi + -R0 (z,.,u,.) - ^0®„1A„)

(A.29)

Substituting (A.29) into (A.24), and noting that

m-1^®-1^; {c^,- + Ä* (Z,.) - ^;rO-1A„} = 0, (A.30)

i=1 i=1 it is easy to show that

(®„-Y^-1Y„ + op(1)) VS(iMo)

X (CKoi + (Z,., U,.) - ^0T [®-1 + Op (1)] A„)

1 " = ViS*^

= + *2 + ^

(A.31)

where ®„ = m"1 Zr=1 № Y = m"1 ZP=1 ^'oi^iT,X- = X* -

YT"-1ui n^^n oi"

Using the Central Limit tteorem, we can obtain

where "—>" means the convergence in distribution and Zo = E (XXT) - E (X¥°T) £-1 E (¥0XT)

(A.33)

In addition, noting that X,V*T = 0 and V0,V*T = 0, we have I2 = 0. Furthermore, we have

'3 = -j! fr - E £-1 (On) V*} R- (Z,, Ut)

+ {E (r£) E-1 (On) - I>-1} VQÄ* (z,, Ui)

- -^tYnEnV- (Z{,U {) = h 1 + -^32 + ^33.

(A.34)

Invoking E{ [Xj - E( r^)£- O„)V*]V*T} = 0, then by Zhao and Xue [4], we have

(X,. - E (r£) E- 1 (O„) V*) V,-T = Op (1). (A.35)

ttis together with ||V;-(Z0))H = 0( 1 ) and \\R(Z,U)\\ = o( 1 ), we get I31 = Op( 1 ). Similarly, I32 = op( 1 ). Noting that

( 1 /V")Zi=1 Y^H- ^qj-V*7 = 0, so as above, we have I 33 = Op( 1 ). Hence, we get that I3 = op( 1 ).

By the law of large numbers, we have (l/n)£"=1X,X; —>

« p »

X0, where "—>" means the convergence in probability. tten using the Slutsky theorem, we get —n(j3 - ß0) —> N(0,a|0Z0-1).

Acknowledgment

Lin and Zeng's research are supported by NNSF projects (11171188, 10921101, and 11231005) of China, NSF and SRRF projects (ZR2010AZ001 and BS2011SF006) of Shandong Province of China and K C Wong-HKBU Fellowship Programme for Mainland China Scholars 2010-11. Wang's research is supported by NSF project (ZR2011AQ007) of Shandong Province of China.

References

[1] X. T. Shen, H.-C. Huang, and J. Ye, "Inference after model selection," Journal of the American Statistical Association, vol. 99, no. 467, pp. 751-762, 2004.

[2] Y. Gai, L. Lin, and X. Wang, "Consistent inference for biased sub-model of high-dimensional partially linear model," Journal of Statistical Planning and Inference, vol. 141, no. 5, pp. 1888-1898,2011.

[3] Y. Zeng, L. Lin, andX. Wang, "Multi-step-adjustment consistent inference for biased sub-model of multidimensional linear regression," Acta Mathematica Scientia, vol. 32, no. 6, pp. 1019-1031,2012 (Chinese).

[4] P. Zhao and L. Xue, "Variable selection for semiparametric varying coefficient partially linear models," Statistics & Probability Letters, vol. 79, no. 20, pp. 2148-2157, 2009.

[5] J. Fan and R. Li, "Variable selection via nonconcave penalized likelihood and its oracle properties," Journal of the American Statistical Association, vol. 96, no. 456, pp. 1348-1360, 2001.

[6] L. Wang, G. Chen, and H. Li, "Group SCAD regression analysis for microarray time course gene expression data," Bioinformat-ics, vol. 23, no. 12, pp. 1486-1494, 2007.

[7] L. Wang, H. Li, and J. Z. Huang, "Variable selection in non-parametric varying-coefficient models for analysis of repeated measurements," Journal of the American Statistical Association, vol. 103, no. 484, pp. 1556-1569, 2008.

[8] E. F. Simas Filho and J. M. Seixas, "Nonlinear independent component analysis: theoretical review and applications," Learning and Nonlinear Models, vol. 5, no. 2, pp. 99-120, 2007.

[9] J. Fan, Y. Feng, and R. Song, "Nonparametric independence screening in sparse ultra-high-dimensional additive models," Journal of the American Statistical Association, vol. 106, no. 494, pp. 544-557, 2011.

[10] C. J. Stone, "Optimal global rates of convergence for nonparametric regression," №e Annals of Statistics, vol. 10, no. 4, pp. 1040-1053, 1982.

[11] T. W. Anderson, An Introduction to Multivariate Statistical Analysis, John Wiley & Sons, 3rd edition, 2003.

[12] P. Rütimann and P. Bühlmann, "High dimensional sparse covariance estimation via directed acyclic graphs," Electronic Journal of Statistics, vol. 3, pp. 1133-1160, 2009.

[13] T. Cai and W. D. Liu, "Adaptive thresholding for sparse covariance matrix estimation," Journal of the American Statistical Association, vol. 106, no. 494, pp. 672-684, 2011.

[14] H. Zou, '"tte adaptive lasso and its oracle properties," Journal of the American Statistical Association, vol. 101, no. 476, pp. 1418-1429,2006.

[15] A. Hyvärinen and E. Oja, "A fast fixed-point algorithm for independent component analysis," Neural Computation, vol. 9, no. 7, pp. 1483-1492, 1997.

[16] W. Härdle, H. Liang, and J. T. Gao, Partially Linear Models, Physica, Heidelberg, Germany, 2000.

Copyright of Abstract & Applied Analysis is the property of Hindawi Publishing Corporation and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.