Hindawi Publishing Corporation Journal of Applied Mathematics Volume 2014, Article ID 360249, 16 pages http://dx.doi.org/10.1155/2014/360249

Research Article

New Inference Procedures for Semiparametric Varying-Coefficient Partially Linear Cox Models

Yunbei Ma1 and Xuan Luo2

1 School of Statistics and Research Center of Statistics, Southwestern University of Finance and Economics, Chengdu, Sichuan 611130, China

2 Information Engineering University, Zhengzhou, Henan 450001, China

Correspondence should be addressed to Yunbei Ma; myb@swufe.edu.cn Received 14 October 2013; Accepted 19 March 2014; Published 25 May 2014 Academic Editor: Jinyun Yuan

Copyright © 2014 Y. Ma and X. Luo. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

In biomedical research, one major objective is to identify risk factors and study their risk impacts, as this identification can help clinicians to both properly make a decision and increase efficiency of treatments and resource allocation. A two-step penalized-based procedure is proposed to select linear regression coefficients for linear components and to identify significant nonparametric varying-coefficient functions for semiparametric varying-coefficient partially linear Cox models. It is shown that the penalized-based resulting estimators of the linear regression coefficients are asymptotically normal and have oracle properties, and the resulting estimators of the varying-coefficient functions have optimal convergence rates. A simulation study and an empirical example are presented for illustration.

1. Introduction

To balance model flexibility and specificity, a variety of semiparametric models have been proposed in survival analysis. For example, Huang [1] studied partially linear Cox models and Cai et al. [2] and Fan et al. [3] studied the Cox proportional hazard model with varying coefficients. Tian et al. [4] proposed an estimation procedure for the Cox model with time-varying coefficients. Perhaps the most appropriate model is the semiparametric varying-coefficient model; its merit includes easy interpretation, flexible structure, potential interaction between covariates, and dimension reduction of nonparametric components models. Recently, a lot of effort has been made in this direction. Cai et al. [2] considered the Cox proportional hazard model with a semiparametric varying-coefficient structure. Yin et al. [5] proposed the semiparametric additive hazard model with varying coefficients.

In biomedical research and clinical trials, one major objective is to identify risk factors and study their risk impacts, as this identification can help clinicians to properly

make a decision and increase efficiency of treatment and resource allocation. This is essentially a variable selection procedure in spirit. When the outcomes are censored, selection of significant risk factors becomes more complicated and challenging than with the case where all outcomes are complete. However, important progress has been made in recent literature. For instance, Fan and Li [6] extended the nonconcave penalized likelihood approach proposed by Fan and Li [7] to the Coxproportional hazards model and the Cox proportional hazards frailty model. Johnson et al. [8, 9] proposed procedures for selecting variables in semiparametric linear regression models for censored data. However, these papers did not consider variable selection with functional coefficients; while Wang et al. [10] extended the application of penalized likelihood to the varying-coefficient setting and studied the asymptotic distributions of the estimators, they only focused on the SCAD penalty for completely observed data. Du et al. [11] proposed a penalized variable selection procedure for Cox models with a semiparametric relative risk.

In this paper, we study variable selection for both regression parameters and varying coefficients in the semiparamet-ric varying-coefficient additive hazards model:

X(t\Z, V, W)

= X0 (t) + ßT (W (t)) Z (t) + arV (t) + g(W (t)),

where V(t) is a vector of covariates with a linear effect on the logarithm of the hazard function X(t), W(t) is the main exposure variable of interest whose effect on the logarithm of the hazard might be nonlinear, and Z(-) = (Z1 (■),..., Zp (■))T is a vector of covariates that may interact with the exposure covariate W(). X0(■) is the baseline hazard ratio function and g() is an unspecified smooth function. a is a d-vector of constant parameters. Let g(0) = 0 to assure that the model is identifiable. This model covers generally used basic semiparametric model settings. For instance, if = 0 and g() = 0, model (1) reduces to Lin and Ying's additive hazard model [12] and Aalen's additive hazard model when the baseline hazard function X0(t) also equals to zero [13,14]. Furthermore, when the exposure covariate W() is time t, then model (1) reduces to the partly parametric additive hazard model if X0(t) = g(t) = 0 [15,16] and reduces to the time-varying coefficient additive hazard model [17].

To select significant elements of a and coefficient functions we require an estimation procedure for the regression parameters a and However, this poses a challenge because it is impossible to simultaneously get the root-n consistent penalized estimators for the nonzero components of a by penalizing and a. To achieve this goal, we propose the following two-step estimation procedure for selecting the relevant variables. In Step 1, we estimate a by using the profile penalized least square function after locally approximating the nonparametric functions and g(). In Step2, given the penalized estimator of a in Step 1, we develop a penalized least square function for by using the basis expansion. We will demonstrate that the proposed estimation procedures have oracle properties, and all penalized estimators of nonzero components can achieve their optimal convergence rate.

This paper is organized as follows. Section 2 introduces the SCAD-based variable selection procedure for the parametric components and establishes the asymptotic normality for the resulting penalized estimators. Section 3 discusses a variable selection procedure for the coefficient functions by approximating these functions by using the B-spline approach. The simulation studies and an application of the proposed methods in a real data example are included in Sections 4 and 5, respectively. The technical lemmas and proofs of the main results are given in the appendix.

2. Penalized Least Squares Based Variable Selection for Parametric Components

Let Tt denote the potential failure time, let Ct denote the potential censoring time, and let Xt = min(Tt,Ct) denote the observed time for the (th individual, i = \,...,n. Let A; be an indicator which equals 1 if Xt is a failure time and 0 otherwise. Let Ft i represent the failure, censoring, and

covariate information up to time t for the (th individual. The observed data structure is [X{,A i,Zi(t),Vi(t),Wi(t),i = 1,. ..,n}. Assume that T and C are conditionally independent given covariates and that the observation period is [0,r], where r is a constant denoting the time for the end of the study.

Let Ni(t) = I(Xj < t, A j = 1) denote the counting process corresponding to Tt and let Yt(t) = I(Xt > t). Let the filtration [Ft :te [0, r]} be the history up to time t; that is, Ft = a[Xt < u, Zi(u), Vi(u), Wi(u), Yt(u), A ¡, 0<u<t,i =

1,...,n}.WriteMi(t) = Ni(t)-j0T Yi(u)Xi(u)du.ThenMi(t) is a martingale with respect to Ft. For ease of presentation, we drop the dependence of covariates on time. The methods and proofs in this paper are applicable to external time-dependent covariates [18].

Here we use the techniques of local linear fitting [19]. Suppose that $(■) and g() are smooth enough to allow Taylor's expansion as follows: for each w0 e W which is the support of W and for w in a neighborhood of w0,

ß (w) ~ ß (w0) + ß' (w0) (w - w0), g(w) ~ g (wQ) + g (wQ) (w0 - w).

Then we can approximate the hazard ratio function given in (1) by

A (t | Zit Vj, Wj) « a; (t, w0) + t]r (w0) Z' (w0) + aTVi, (3)

where n(w0) = (fi1 (w0), (?'(w0))r,g'(w0))\ Z*(w0) = (Zj, Zj(W, - w0), (W, - w0))T, and AO(t, w0) = X0(t) + g(w0).

Let H be a 2p+1 diagonal matrix with the first p diagonal elements 1 and the rest h. If a is given, then by using the counting-process notation, similarly to [5], we can obtain an estimator of q(w0) at each w0 e W by solving the estimating equations whose objective function is given as follows:

Un (n,w{

n rx _

X \ Kh (W, - Wo) H-1 {Z: (wo) - Z (t, ^o)j

¡=1 Jo

x [dN, (t) - Y, (t)

x {t]r (Wo) z: (Wo) + arVij du],

~(fll) IU Kh (W, -W0)Yi (t)Z; (W0)

Kh(^) = K(/h)/h with K() being a kernel function and h being a bandwidth, and t = to. To avoid the technicality of tail problems, only data up to a finite time point t are frequently used.

Denote the solution of Un(q, w0) = 0 by t](w0, a), which can be expressed as follows:

tj(w0, a)

1 n iT

"Il Kh (W, -w0 ) H-i

x[z* (wo)-Z(t,Wo)}®2Y, (t)dt

^h ("i -wo) H

1 n cr

7,I jo Kh (w<

x [z* (wo)-Z (t, ^o)} [dN, (t) - Y, (t) V,Ta dt}

d==f Mnl (wo) {Sni (wo) - Sn2 (wo) «} •

Here and below, &®k = 1, a, aaT for any vector a and for k = 0,1,2.

As a consequence, we can estimate fi(-) and g(-) presuming that a is known:

ß(W„ a)

= (lp, 0pXp, 0p) Mn\ (W,) foi (W,) - Sn2 (W,) «}

^Tni (W,)-Tn2 (W,)<

g(W„ a)

o (0pxp,0pxp,1p)Mnl (wo)

X {Sni (wo) - Sn2 (wo) «} dWo = Tn3 (W,)-Tn4 (W,) «•

2.1. Variable Selection. Notice that if a is given, then from (7) we can estimate A0(t, W,) = J0 X0(u, W,)du by

A o (t,W„ a)

* [dNt (u) - Yt (u) [ZjTni (wt) + Tn3 (wt)} du]

Hi Y, (»)

n c* Yt (u) [V? - ZjTn2 (wt) - Tn4 (w,)] « du Ijo

m=i Y, (U)

def f C Hi idN> (») - Y, (») {Gn, + V* (W,) «} du]

m=i y, M

Given the observed data and based on (7) and (8), fi(-), g(-), and the cumulative function A0(■) can be approximately

expressed as functions of a. hese expressions inspire us to estimate a by minimizing the following objective function subject to a:

1 n ! ir ^ Ln (a) = 2% (Jo [dN, (t) - Y, (t) dAo (t, Wt, a)

-Yt (t)[zjp(Wt, a) + g(W„ a) (9)

+Vjaj dt] ) .

Accordingly, our penalized objective function is defined as follows:

Ll (<x) = Ln (a) + nIpx }=i

hus, minimizing L*n(a) with respect to a results in a penalized least squares estimator of a.

Let an = max[p'x^ (Kj!) : «oj = 0} and bn = maxjp" (|a0j|) : a0j = 0}. Now we establish the oracle

properties for the resulting estimators. Let a0 = (a^0, aj0) be the true value of a, the first subvector a^0 of length s, containing all nonzero elements of a and aj0 = 0d_s.

We introduce an additional notation as follows. Let Z = (Zr,0p,0)T. For k = 0,1, let = J xkK(x)dx, vk =

} J xk K2(x)dx, pk(u, z, v, w0) = P(X >u! Z = z,V = v,W =

w0)Xk(t ! z, v,w0). Denote

fk,i (t,wo)

= f(wo)E{po (t,Z, v, Wo) Z (v®1 ) \W = Wo},

k = 0,1, 1 = 0,1,

?2,o (t' '

= f(wo)E

Po (t, Z, 1

0pxp 0p x ( 0pXp V2Z®2 ) \ W = Wo

0P ZTH2 V2

where f(-) is the density of W. Denote ti(w) = (Ip,0

p' pxp'

0p)m1i(w)m2(w) and t2(w) = (0pxp,0pxp,1p) ¡^ W^o) m2(wo)dwo, where

( ^ [T J (t ) <o (t,

mi (wo) = l J fl,o fa Wo)--T-

JM %,o fa

(wo) = 1 {^ii (t,Wo)

<Pi,o fc w<

oj f. <Po,o wo

<Po,i (t,wo)

■ dt. (12)

For k = 0,1,2, I = 0,1, let kw(î) = £[pz(f, Z, V, W)|V -

Zi11(W)-it(W)}8fc],andV,(W1) = V,1 -Zti1(Wi)-i2(Wj), i =

1,...,n. 11 a H = (a a) for any vector a.

Theorem 1. If Assumptions (A.i)-(A.vi) are satisfied and fcn —> 0 as n —> to, then there exist /oca/ minimizers a of L*(a) such that ||a - a0|| = 0^(n-1/2 + an).

Remark 2. Theorem 1 indicates that by choosing a proper An, which leads to an = 0(n-1/2), there exists a root-n consistent penalized least squares estimator of a.

Theorem 3. Assume that

Pa (0)

to > lim inf lim inf—7— > 0.

rc^œ 0^0+ X

3. Variable Selection for the Varying Coefficients

When some variables of Z are not relevant in the additive hazard regression model, the corresponding functional coefficients are zero. In this section, by using basis function expansion techniques, we try to estimate those zero functional coefficients as identically zero via nonconcave penalty functions.

For a simplified expression, we denote #(•) as (•) and Z0 = 1. Hence, we can rewrite the additive hazard regression model (1) as

A (i I z;, vf, w;) = A 0 (i) + rT (w;) z; (o + v, ^

Under Assumptions (A.i)-(A.vi), if An — 0 and V"An — to as n — to, then the local root-n consistent local minimizer a = («j1, a2T )T in Theorem 1 with probability tending to 1 satisfies

(1) a2 = 0; and

(2) asymptotic normality

V«(li (r) + r)

x j«i - «10 - (Ii (r) + r)-1b} N (0, Zi (t)) ,

where I1(r) and Z1(r) are the first sxs submatrixof I(r)nand Z(r), respectively, and

b = (pa, (laio|) sBn (aio),..., (KoP sgn (as0))T, r = diag (PA'„ (l«io|),..., (Ko|)) .

where Z,** = (1,zt)Tand jt(W,) = (^(W,),£T(W,))T. Assume a* is a given root-n consistent estimator of a. From the arguments in Section 2, such an a* is available.

3.1. Penalized Function. We approximate each element of P* (w) by the basis expansion; that is, $t(u>) = ^^ y^B^w) for fc = 0,1,...,_p. This approximation indicates that the hazard function given in (16) can be approximated as follows:

a(î|z;,v„w,)

p Lk = Ao (i)A+Jz,fc - Jfe^ M

fc=o 1=1

+ aT V.

Let Yk = (7fci,..., 7/cL)t and r = ^ —, rT)T- Write

B (W,) =

(W,)---£OL0 (W;) 0-0 •••

0...0 Su(w;)• • • ßiLi(w;) :

0 — 0

0 — 0

0 — 0 0 — 0

(w;) • • • ßpL p (w,)/

and Ui(Wi) = Z*TB(Wj). Then, by using similar arguments as £„ (y) in Section 2, we estimate the basic hazard function A 0 (i) by

A o (i) =

2X| [^ (i)-yf (i) (20)

2 =i o

JJ rT dN, (M) - Y, (M) {U, (W,) y + «;TV,| dM

J'o ü=i (m) x {Ä o (i) + U, (Wj) y + a;TV,|di]2.

Suppose that = 0, fc = q + 1,..., _p,as mentioned before. and estimate y by minimizing the following least square We are trying to correctly identify these zero functional function with respect to y: coefficients via nonconcave penalty functions, that is, via

minimizing the following penalized least square function with respect to y:

PLn (r) = Ln (y) + n£px (\\yk\\k),

where \\YkW2k = YkRkYk andRk = (rkij)ikxLk withrkj = $Bki (w)Bkj (w)dw.

Let - be the minimizer of penalized least square function

(21) and define - (w) = B(w)y.

Remark 4. Various basis systems, including Fourier basis, polynomial basis, and B-spline basis, can be used in the

basis expansion. We focus on the B-spline basis and examine

asymptotic properties of - (w).

3.2. Asymptotic Properties. Define U(f) = Ya=i Yi(t)Ui(W{) IZUYAt) and let y be the minimizer of Ln(y) in (20). Hence, we can obtain y by resolving

W {U (w¡) - u (f)}

¡=1J0

x [dN, (t) - Y, (t) {uT (W,) y + a*TV¿}] dt = 0.

Un(y) is the derivative of Ln(y) with respect to y, the usual score function of y without the penalty.

For any square integrable functions b(w) and c(w) on W, let \\b(w)\\L2 denote the L2 norm of b(w) on W, and define \\b(w) - c(w)\\l^ = supwiW\b(w) - c(w)\ as the Lm distance between b(w) and c(w). Let Gk denote all functions that have the form ^^ Yki^ki (w), k = 0,1, ...,p for B-spline base.

Lmax = max0<k<PLk and Pn = max0<k<pinfceGk Wk - c\\l^. — *

Denote /5 (w) = B(w)y.

Let a* = maxj^(Wfah,) : \\Ml2 = 0} and b* = max{p£ (\\Ml 2 ):\\Ml2 = 0}.

Theorem 5. Suppose that (13) and Assumptions (A.i) and (A.vii)-(A.iv) hold. If Xn ^ 0 and \J max{pn, a*, (Lmax/n)1/2} ^ x asn ^ x, then one has

(a) fik = 0, k = q + 1,..., p, with probability approaching 1; and

(b) \\h -h\\L2 = Op(max{pn,a*n,(Lm Jn)m}),k = 0,...,q.

Remark 6. Let Gk be a space of splines with a degree no less than 1 and with Lk equally spaced interior knots, where Lk — n1/5,k = 0,1,..., p. Note that pn = 0(L-kaxi) [20, Theorem 6.21]. Hence, if a* = op(pn), Theorem 5 implies that

\h -h\\l2 =0p(n-2/5),k = 0,...,q.

Remark 7. Suppose that Lk — n1/s, k = 0,1,..., p. Because a* = max[p'x (\\hk\\) : \\fa\\ = 0}, which implies that

for the hard thresholding and SCAD penalty function, the convergent rate of penalized least square estimator is n-2/5. This is the optimal rate for nonparametric regression [21] if Xn ^ 0. However, for the L1 penalty, a* = Xn; hence Xn = Op(pn) is required to lead to the optimal rate. On the other hand, the oracle property in part (b) of Theorem 5 requires Xn/pn ^ >x>, which contradicts Xn = Op(pn). As a consequence, for the L1 penalty, the oracle property does not hold.

Next, we demonstrate the asymptotic normality of - . First, we analyze - = (ft0,..., ftq) , that is, the penalized estimator of pm = (p0,..., fiq)T. Let z(1], U(1), and U(1) denote the selected columns of Z**, U;, and U, respectively, corresponding to ^(1). Similarly, let B(1)(W;) denote selected diagonal blocks of B(W;), i = l,...,n. Define y(1) = (rT, YT,..., rT)Tand f(1) = (?T, ?T,..., -T)T. Then by part (a) of Theorem 5 and (21), -(1) is the local solution of

pu (r(1))

-nt 10 (UP (W)-U(1) (t))T

x [dN, (t) - Yt (t) (23)

x {U1 (W) Y(1) + a*TV}]dt

dp, (||y,

k-o ^ ' Recall that a* is root-n consistent. It follows from (23) that

0 = -I f {U1 W)- U(1) (tü'dM, (t) n J=1 Jo 1 '

x\l + Ot (n-1'2))- £^ Itt ju<,) W)-U(1) (t))TY,(,)

n = Jo 1 '

x{U(1) W) f(1 -Z^ffi» №)}dt

{U1 >(w)- U' (t)t

n =1 o

nt \ {u(1) (wt)- U(1) (t))"dMt (t)

n =1 o

1t\ {U^ (Wt)- U(1) (f)) Z^TYt (t)

n =1 o

x\ß(1) (W)- ß(1) (W,)}dt.

Thus we can obtain the following theorem.

Theorem 8. Suppose Assumptions (A.i) and (A.vii)-(A.iv) hold, lim„^TOfi(J = 0, if A„ — 0 and XJ maxf^, a*, (Lma»1/2j —> to as n — to. For any w e W, let y(1)

and p (1) (w) be the conditional means of y^' and /5( ' (w) given {Zj, Vj, W;, (=1,2,..., nj. We then have

«¿-„MB"' W (s, + r')-1n'(s1 + ry>(1) }

: {/?(1) (w)- (w)} N(0

Tl -1/2

where r* = diag((L_/2)(^(Hftl^)/||£fc||L2)Rfc)ofifcSi?, ^ = lim„^m(L^»ZLWM- U(1)(i)j®2Yj(i)di, and Q*

(imax/») zr=1 j; U(1)(W,)-U(1)(i)}^2yi(i)Ai(i)di.

n — œ v max

4. Simulation Studies

We carried out two sets of simulation studies to examine the finite-sample properties of the proposed methods. w was generated from a uniform distribution over [0,3] and An was selected via cross-validation [7]. We set a = 2 + V3 and reported sampling properties based on 200 replications. We consider two scenarios: (I) the performance of the regression coefficient estimates. Without loss of generality, we set p = 1. We generated failure times from the partially linear additive hazards model (1) with ^(w) = 1.2 + sin(2w), a = (1.5,0,1,0, 0)T, = 0.3w2, and A0(i) = 0.5. Covariate Z was generated from a uniform distribution over [0,1], V1 was a Bernoulli random variable taking a value of 0 or 1 with a probability of 0.5, V2,V3 were generated from uniform distributions over [0,1] and [0,2], respectively, and (Vr4,V5) was generated from a normal distribution with mean zero, variance 0.25, and covariance 0.25. We generated the censoring time C from a uniform distribution over [cc/2,3cc/2]. We took cc = 0.86 to yield an approximate censoring rate of CR = 20% and cc = 0.35 to yield an approximate censoring rate of CR = 30%. We used sample size n = 200. We selected the optimal bandwidth /iopt by using the method of [22] and found that the value is approximately 0.203. We present the average number of zero coefficients in Table 1, in which the column labeled "correct" presents the average restricted only to the true zero coefficients, while the column labeled "incorrect" depicts the average of coefficients erroneously set to 0. We also report the standard deviations (SD), the average standard errors (SE), and the coverage probability (CP) of the 95% confidence interval for the nonzero parameters in Table 2.

As shown in Tables 1 and 2, SCAD, HARD, and L1 perform well and select about the same correct number of significant variables. The coverage probability (CP) based on SCAD and HARD, however, is better than that based on L1.

In scenario (II), we focused on functional coefficients £(•). We set d = 1, = 0.3w2, and a = 1. We

generated failure times from the partially linear additive hazards model (1) with = 1.2 + sin(2w), (w) =

0.5w2, and = ^4(ui) = = 0. Both covariates

V and Z1 were generated from a uniform distribution over

Table 1: Results of the simulation study for scenario (I). Here "correct" is the average true positive rate and "incorrect" is the average true negative rate.

SCAD HARD L 1 n = 200, CR = 20%

SCAD HARD L1 n = 200, CR = 30%

Correct 2.73 Incorrect 0.12

2.76 0.11

2.67 Correct 0.21 Incorrect

2.85 0.13

2.85 0.13

2.85 0.15

Table 2: Simulation results for scenario (I): bias, standard error (SE), standard deviation (SD), and coverage probability (CP).

a1 a3

SCAD HARD L1 SCAD HARD L1

n = 200, CR = 20%

bias 0.0738 0.0729 0.0527 0.0552 0.0475 0.0324

SE 0.0767 0.0767 0.0757 0.0567 0.0566 0.0558

SD 0.0635 0.0645 0.0631 0.0495 0.0501 0.0558

CP 0.96 0.91 0.91 0.90 0.89 0.88

n = 200, CR = 30%

bias 0.0726 0.0726 0.0402 0.0480 0.0466 0.0280

SE 0.0824 0.0824 0.0812 0.0612 0.0612 0.0602

SD 0.0701 0.0721 0.0726 0.0618 0.0602 0.0613

CP 0.93 0.89 0.92 0.94 0.90 0.91

[0,1], respectively, (V2,V3)T was generated from a normal distribution with a mean of (0.5,0.5) , variance (0.25, 0.25) and covariance 0, V4 was a Bernoulli random variable taking a value of 0 or 1 with a probability of 0.5, and V5 was uniform over [0,2]. In simulation II, we took cc = 1.24 to yield an approximate censoring rate of CR = 10% and cc = 0.45 to yield an approximate censoring rate of CR = 30%. We used sample size n = 200. The average number of zero functional coefficients based on SCAD and HARD is reported in Table 3, and fitted curves of nonzero functional coefficients are presented in Figures 1 and 2 based on SCAD and HARD, respectively.

Notice that penalized estimates of functional coefficients are based on the results in Step 1. Because the oracle property does not hold for the L1 penalty according to Remark 7, we did not take L1 penalty into account.

We can see from Table 3 and Figures 1 and 2 that the penalized spline estimation of the functional coefficient that we proposed performed well. Furthermore, for the varying coefficient part, the "correct" numbers of significant variables selected based on SCAD and HARD penalties are close. In comparing the results between different censoring rates, it is also shown that our method is extremely robust against the censoring rate.

5. Real Data Example

In this section, we apply the proposed method to a SUPPORT (Study to Understand Prognoses Preferences Outcomes and Risks of Treatment) dataset [23]. This study was a multicenter study designed to examine outcomes and clinical decision

Table 3: Results from the simulation study for scenario (II). The legend is the same as in Table 1.

SCAD HARD n = 200, CR = 10%

SCAD HARD n = 200, CR = 30%

Correct 2.61 Incorrect 0.165

2.79 0.195

Correct Incorrect

2.625 0.48

2.745 0.48

making for seriously ill-hospitalized patients. One thousand patients suffering from one of eight different diseases, including acute respiratoryfailure (ARF), chronic obstructive pulmonary disease (COPD), congestive heart failure (CHF), cirrhosis, coma, colon cancer, lung cancer, and multiple organ system failure (MOSF) were followed up to 5.56 years. 672 observations were collected, including age, sex, race, coma score (scoma), number of comorbidities (num.co), charges, average of therapeutic intervention scoring system (avtiss), white blood cell count (wbc), heart rate (hrt), respiratory rate (resp), temperature (temp), serum bilirubin level (bili), creatinine (crea), serum sodium (sod), and imputed ADL calibrated to surrogate (adlsc). For more details about this study, refer to [23].

We suppose that binary and multinary risk factors have constant coefficients, which yields a 12-dimensional parameter after transforming multinary variables into binary. Conversely, other risk factors are supposed to have varying coefficients, that is, a 13-dimension functional coefficient. Here the main exposure variable W corresponds to age. Censoring rate of 672 patients is 32.7%. The model we considered is

X(t\Z„V„ aget) = X0 (t) + Yßj

+ YakV'k +#(agei)> k=1

where V^, j = 1,2, ...,13, denote binary variables of the (th patient and Zik, k = 1,2,. ..,12, denote the others of the (th patient. Here, baseline hazard function X0(t) is the failure risk of patients who are white females suffering from MOSF with other variables all equal to zero.

By fitting model (26) with a SCAD penalty, the parameter of "other race" is identified as zero, and functional coefficients of num.co, charges, wbc, hrt, temp, crea, sod, adlsc, and g(-) are identified as zero functions. Identified significant risk factors and results of their parametric or functional coefficients are shown in Figure 3 and Table 4.

The identified zero parameter of "other race" shows that, besides Asian, black, and Hispanic, "other race" has no significant different impact on risk in contrast with white people. Table 4 demonstrates that Asians have the lowest failure probability, as do people who are suffering from MOSF. Conversely, coma is the most dangerous among these eight different diseases. Figure 3 shows the fitted coefficients and their 95% pointwise confidence intervals of the five risk factors which were identified as significant.

Table 4: Applied to SUPPORT data. Estimation results of significant binary variables using the SCAD penalty.

Risk factor Estimator Variance x 10 3

Sex 0.0006 0.0178

Disease group

ARF 0.0007 0.0247

CHF 0.0011 0.0309

Cirrhosis 0.0018 0.0450

Colon cancer 0.0034 0.0580

Lung cancer 0.0042 0.0585

Coma 0.0074 0.1410

COPD 0.0017 0.0344

Asian -0.0002 0.0414

Black 0.0006 0.0232

Hispanic 0.0008 0.0421

Appendices

In this appendix, we list assumptions and outline the proofs of the main results. The following assumptions are imposed.

A. Assumptions

(A.i) The density f(w) of w is continuous, has compact support W, and satisfies infweWf(w) > 0.

(A.ii) p(-) and g(-) have continuous bounded second derivatives on W.

(A.iii) The density function K(-) is bounded and symmetric andhasacompactboundedsupport. JQ X0(t)dt < <x>.

(A.iv) h ^ 0, nh ^ <x> and nh4 ^ 0 as n ^ <x>.

(A.v) The conditional probability p0(t,z, v,w) is equicon-tinuous in the arguments (t, w) on the product space [0,t] x W.

(A.vi)

f0,0(w,t) and K0,0(w,t) are bounded away from zero on the product space [0, r] x W, m1(-) and m2(-) are continuous on W, m1(w0) is nonsingular for all w0 e W, and

r\ <0 (f)

I (T)=\ Wo (t)--1^ jo [ Koo (t)

Z (T) =

K2,1 (t) +

*1,0 (t) K0,0

*i,o (t)

Ku (t)

*0,1 (*)

are positive-definite.

(A.vii) The eigenvalues of the matrix Z(t)ZT(t)Y(t)dt] are uniformlybounded away from 0.

3 -2 -( 10 --1 -

"1 I I I I I r

0.0 0.5 1.0 1.5 2.0 2.5 3.0

(a) HARD

ü 1 H 0

"1 I I I I I r

0.0 0.5 1.0 1.5 2.0 2.5 3.0

(c) HARD

4 -1 32 1 0

"1 I I I I I r

0.0 0.5 1.0 1.5 2.0 2.5 3.0

(e) HARD

"1 I I I I I r

0.0 0.5 1.0 1.5 2.0 2.5 3.0

(b) SCAD

"1 I I I I I r

0.0 0.5 1.0 1.5 2.0 2.5 3.0

(d) SCAD

<s£ 2

"1 I I I I I T

0.0 0.5 1.0 1.5 2.0 2.5 3.0

(f) SCAD

Figure 1: Results of the simulation study for scenario (I): the estimated curves of the nonzero functional coefficients w), (w), and g(w). The solid, dashed, and dotted curves represent the true function, estimated functions, and 95% pointwise confidence intervals, respectively. Left panel—HARD penalty; right panel—SCAD penalty.

(A.viii) lim supn(maxfcLk/minkLk) < œ>. (AJx) lim„^œn-1Lmax log(Lmax) = 0.

B. Technical Lemmas

Write cn = (nh)-1/2 + h2. Before we prove the theorems, we present several lemmas for the proofs of the main results. Lemma B.1 will be used for the proofs of Theorems 1 and 3. Lemma B.5 establishes the convergence rate for the spline approximation of nonparametric functions. Its proof can be finished in a similar way as in [24].

Cni {t, w0)

= n ^Yi (t)qi If,---,Z,,V,w0

¡=1 \ xKh {Wt -wo),

Cn2 {t,Wo) = n-1YY, (t)q2 {t,W„Z,,V)

for functions q1(-, ■) and q2(■, ■).

4 -3 -2 -( 10 --1 -

0 -1 -2

(a) HARD

(c) HARD

(e) HARD

( 1 ■ 0 -1

â 1 -0 -1

(b) SCAD

(d) SCAD

(f) SCAD

Figure 2: Results of the simulation study for scenario (II). he legend is the same as in Figure 1.

Lemma B.1. Assume that q1 (t, u, Z, V, w) is equicontinuous in its arguments w and t, q2(t,W, Z,V) is equicontinuous in its argument t, and that E(q1(t,u,Z,V,W) | W = w0) is equicontinuous in the argument w0. Under Assumptions (A.i)-(A.v), one has for each w0 e W,

Cnj (t, w0) = Cj (t, w0) + Op (cn),

sup sup \Cnj (t, w0) - Cj (t, ^ 0,

0<t<XWaiW

where Cj(t,w0) = f(w0) f E[Y(t)qj(t,u, Z,V,w0) | W = w0}K(u)du for j = 1,2.

Its proof can be finished using the similar arguments in Lemma 1 of [25].

Lemma B.2. Let U*(a0) = -(dLn(a)/da)la=ao. Assume that (13) and Assumptions (A.i)-(A.vi) hold. We then have

K («0 )

n (r ~ — t

= 1\ [Vi (Wt) -V(t)] dMt (t) + op (Vn) (B.3) i=i

= VnUn (<X0) +op (jn), where V(t) = rn=i Y, WW)/^ Y,(t).

T 60 Age

(OX 6 —

(a) scoma

(b) avtisst

Age Age

Age (e) bili

Figure 3: Results of the real data example. The estimated curves of those nonzero functional coefficients. The solid and dashed curves represent the estimated curve and 95% pointwise confidence intervals (x!04).

Proof. Because U*(œ0) = -(dLn(a)/da)!a=ao,wehave

^ («0 )

" rt t

= l\0 [v;m-r

x [dN, (t) - Yt (t) {Gni (t) + v; (W) «} dt]

X jo {v; w-vwf

x [dN, (t) - Yt (t)

x {ZTß(W„ ao) + g(W„ «o) + a^dt]

with v\t) = Z=i Yi(t)Vi'(Wi)/ln=1 Y,(t). By a similar discussion in [5], we can prove that the biases of f$(wo, ao) and g(w0, ao) are the order of Op(h2) at each point w0 e W, respectively.

Furthermore, recall that Vt*(Wi) = ViT - ZTiTn2(Wi) -Tn4(Wi) with Tn2(Wt) = (Ip,0pxp,0p)Mnl(Wi)Sn2(Wi) and

Tn4(Wi) = Jo (0pxp>°pxp> 1pW_i (W0)Sn2(Wo)dWo,where

Mnl (w0 )

1 n çr __Q2

-1\ Kh (W, -Wo)[z; (wo)-Z(t,Wo)} Y, (t)dt

n £Jo

K20 (t>Wo)

'n2o V> w0) ^ /, \ Jo I On00 it, Wo)

■ dt,

Sn2 (w0)

1 n çr _

-1\ Kh (w, - Wo) {z; (wo) - Z (t, ^o)} V,TY, (t) dt

n £Jo

|o jonii (t,^o)-

S U \ ®n10 (t,Wo)

\11 (t' W0) - T-T.-\ On01 (t, Wo)

On00 (.t'^

■ dt,

n [r ~ — t

I\ {V, (W,)-V(t)+Op (Cn)}Y, (t)

¡=1 Jo

(zj (t),QPp+1)M-J (W,)

1 n iT

Kh -Wt) H-1

nj=1J0

x{z; (t,w,)-z(t,w,)}dMj (t)

+ (oTp,h-1)

x \ M-J (w)

1 n (T

-l\oKh № -u>) H-1

x {z* (w)-Z(t, w)}

x dM: (t)

dt + Op ( jn). (B.8)

with ®nkl(t,Wo) = (1/n) zn=i Kh(Wi - Wo){Z*(wo)r(V®1) Y() for k = 0,1,2 and / = 0,1. It follows from Lemma B.1 that

Mn1 (Wo) = m1 (Wo) + Op (Cn)

Sn2 (w0) = m2 (w0) + Op (cn)

for each w0 e W. his implies that

Tn2 (w)= t1 (w) + Op (Cn),

Tn4 (w) = f2 (W) + Op (Cn)

A further simplification indicates that U*(ao) can be expressed as

n it ~ — t

I\ {v, m-v(t)} dM, a)

n (r ~ — t -1\ {V, № )-V(t)+Op (Cn)}Y, (t)

x[{zj,0rp,1}m-1 (W,)Op (cn)]dt + Op (jn) n (r ~ — t

I \ {vt (Wi) - v (i)} dM, (t) + Op (jn).

,=1 Jo

hus (B.3) holds.

uniformly hold on W. Hence it follows from Assumption (A.iv), the arguments similar to the proof of heorem 3 in Lemma b.3. Assume that (13) and Assumptions (A.i)-(A.vi)

[5] and the martingale central limit that

u; («0)

n (r ~ — t

I\ {V, (W,)-V(t) + Op (Cn)} dM, (t)

hold. If Xn ^ 0, ^n\n ^ to as n ^ >x>, then with probability tending to 1, for any given satisfying - aloy = Op(n_1/2), and any given constant C,

Li{K,°L)} =. r^S{(a1,a2)}. (B.io)

Proof. It suffices to prove that, for any given satisfying ||aj -

noil - ^p S + 1,... , C

a1o|| = O„(n 1/2), any given constant C > 0, and each a.-, j =

9L* (a) i/2

; <0 if - Cn-1/2 < a.- < 0,

9L* (a) 1/2

"V y >0 if 0 < a; < Cn 9a;

hold with probability tending to 1. Notice that U*(a0) = -(9L„(a)/9a)|a=ao. Then for each a in a neighborhood of a0 from Lemma B.2 we have

— {L„ (a)-L„ («o)|

= -l/„ («o)(« - «o)jl+op (1)}

+ 2(« - «o)T V«n («o) (« - «o) i1 + Op (1)} .

(B.12)

By similar algebra with the proof of Lemma B.2, we know that n («o)

1 A r r r~ — -.Tn 82

"I Jo [{Vi (W)-V(f)} ] Y (i)Ao (i)di + Op (cj

j [{V, (w,) - V (0}T]82y (i) Ao (i) df) + Op (cj

= n* («o) + Op (cj

(B.13)

is positive-definite with probability tending to 1 from Assumption (A.vi) and Lemma B.1. Hence for a - a0 = Op(n-1/2), bythe fact of L7„ = Op(1), we have

1 9L„(«) V« 9a

= -L7„ {1 + (1)} + V«n* («o) (a - «o) {1 + (1)} = V«n* («o)(a - «o) + Op (1),

(B.14)

and then

9l; («)

9L „ («) 9a:

„ (|a;|) sgn (a;)

(B.15)

n A„

„ (lail) sgn(a;0

• nA„

As a result, it follows from (13) and An — 0, V«A„ — to as n —> to that sgn(9L*(a)/9a;) = sgn(a;) for each a - a0 =

O^(n-1/2), which indicates that (B.11) holds with probability tending to 1. □

Lemma B.4. Suppose Assumptions (A.i) and (A.vii)-(A.ix) hold, then there are positive constants M1 and M2 such that, except in an event whose probability tends to zero, all the eigenvalues of

(B.11) Lfx if {U (w)- U wfY,

" i=1Jo

(B.16)

between M1 and M2, and consequently £"=1 j^U; (W;) -U(i)}82Yj(i)di, are invertible.

Proof. According to Lemmas A.1 and A.2 of [24], except in an event whose probability tends to zero, || yu-BfcK^)^ -iy|2/Lmax. Note that yT[£=1 JJlU^W,) - U(i)}82Y,(i)di]y -II Xfc i yfcA;(w)llr, by Assumption (A.ii). Hence, Lemma B.4

holds.

In the rest of this paper, we denote n 1 £"=1 J()T|Uj(Wj) -

U(i)}82Y;(i)di by S„. Let

1 " rT

r = 5-1 "ll {U (W)- u (i)}

" ti Jo

x{Ao (f) + Y (i) £*T (Wf)Z, +Op (n-1/2)} d,

(B.17)

and ^ (w) = B(u>)y. Then y and ^ (w) are the mean of

y and /5 (u>) conditioning on {Z;,Vj,W;, i = 1,2,...,«}, respectively.

Lemma B.5. Suppose Assumptions (A.ii), (A.viii), and (A.ix)

hold. We then have — 0, v/n^W — to as n — to,

— * ~ 2 ~ *

||/5 -0*||i2 = Op(Lmax/n), ||£ - £*Hl2 = O^J, and HP - rHi2 = Op(Lmax/n + ^2).

Proof. By Assumptions (A.ii) and (A.viii), = O^-^) [20, Theorem 6.27]. Hence, it follows from Assumption (A.iv) that

— 0 and Vnfti — to as n — to. By the triangle

—* — * ~ ~ *

inequality, ||/5 - £*HL2 < ||/5 + ||£ - 0*||i2 .Note

1 " rT _

nI jo {U (W) - u(i)} {dM, (i) + OP (n-1/2)}.

(B.18)

Thus, according to S-1(1/n) £¡=1 £|U,(W,) - U(i)}dM,(i) =

OP(L2max/n), which can be similarly proved by Lemma A.4 in

—* 2 ~

[24], we have ||/5 - ||i2 - iy- y|2/Lmax = OP(Lmax/n). On

the other hand, by the properties of B-spline basis functions,

we can prove that ||ß - ß ||i2 = Op(^„) by the similar argument of Lemma A.7 in [24]. Thus, Lemma B.5 holds. □

Lemma B.6. Suppose (13) and Assumptions (A.i) and (A.vii)-(A.iv) hold. If pn ^ 0 and Xn/pn ^ œ>, b* ^ 0 as n ^ rn,

«. * ^ *

we then have \\p - p ||i2 =Op(an +(\npn)U2 + (LmJn)112).

Proof. Using the properties of B-spline basis functions (Section A.2 of [24]), we have

ßk -hl =\\п - nWt ^L-k\\fk - h\\

(B.19)

which we sum over к to obtain

ß* -ß'\\L =Вп - Ш *imjr- v\\ - (B.20)

Let Sn = a*n + (Xnpn)112 + (Lmzx/n)1/2. It suffices to show that, for any given e > 0, there exists a large enough C such that

Pr { mfcPLn (y + L^x> PLn (y)} > 1 - e, (B.21)

which implies PLn(-) can reach a local minimum in the ball [y + L^l^nu : ||u|| < С} with a probability of at least 1-е. Thus, there exists a local minimizer satisfying ||у -П = Op(Lmix8n). Let у = у + L^u; then by the Taylor expansion and the definition of у we have

We first examine the second term on the right-hand side of (B.23). A direct calculation with the Taylor expansion and using px (0) = 0 yields that

UpK (\\Vk\\k)-PK MJ}

= î[[pk m Wk)-Px„ {\\M2 )}

-{pK (\Vk\\k)-Pxn {\\M2)}]

->1M „

2 )(\\Vk\\k -\\ßkl2 )

-Px„ (\\ßkl2)(\\Vk\\k -\\ßkl2)}

+ k (\\ßkl2)

*{(\\vk\\k -\\ßk\\L2 ) -(WvkWk -\\ßk\\L2 )2} *(1 + 0P (1))]

- lpxn (Mi)>-<°P (<УИ1

-K°P (€ + Pn) Hull2 -p'Xn (O)Op (Pn).

(B.24)

Here, p'x (0) = lim0|O_p^ (в). Thus, combining that with (B.23) yields

-\PLn (y) - PLn (y)l

PLn (y) - PLn (y) = - y)TEn (V - Ï0{1 + 0P (1)}

+ ^(V- vf^n (V- V){1+Op (1)}

+ {Pxn ÜVkWÜ - Pxn (11Ш}-

(B.22)

It follows from Lemmas B.4 and B.5 that

PLn (y)-PLn (f)

> ( <1 + V^ ) |uH2Op (1)

- a*n°p Ю HuH - K°P (sn + SnPn) -p'x„ (°)Op (Pn).

(B.25)

Notice that Sn = + (Kpnf1 + (L mJn)ilA and b*n ^ 0 as n ^ rn. Then by choosing a sufficiently large C, the first term dominates the second and the third terms on the right-hand side of (B.25). On the other hand, according to (13), we have (Sn + (L mJn)1/2)2/p'K (0)pn = Op(l + (L max + nan )/^nPn). Hence, by choosing a sufficiently large C, (B.21) holds. This completes the proof of Lemma B.6. □

= n( Sn +

llufOP (1){1 + Op (1)}

+ {Pxn QlVkWù - px„ (\\Vk\\k)} -

(B.23)

C. Proof of Theorem 1

Let yn = n1'2 + an. It is sufficient to prove that for any given e > 0, there exists a large enough constant C such that

P {(infin («0 + 7nu) >Vn («0)} > 1 - e, (C.1)

which means L* (•) canreach alocal minimuminthe ball |a0+ y„u : ||u|| < C} with a probability of at least 1 - e. Thus, there exists a local minimizer satisfying ||a - a0|| = Op(y„). Now we are going to prove (C.1). Note that

Then by some regular calculations we can obtain

(«0 + y>iu) - Ln («0)

« («o + y„u)-L« («o)

+ MI K (lac; + y„"j|)-pA„ (M]. i=1

Because U*(ac) = -(9L„(a)/9a)|a=„o, it similarly follows from (A.iv) that

R« («o + y„u) - L« («o)}

(«o)y„u {l + op (1)}

2 uTV^n; («o) u {l+op (1)}y2.

Hence, we can obtain that Ln («0 + y„u) - («0)

>-VnU„ («0)y„u {1+op (1)} + 1 uT«n; («0) u {1 + 0P (1)}y^

+ «Z {y»PA„ (|a0j|) sgn (a0j)"j i=1

+y^l'„ (KI) sgn («0j) (1 + (1))}.

Notice that U^K) = Op(1), n(«0) = Op(1). Hence, by choosing a large enough constant C, - ^n\Jn(a0 )ynu|1+o^(1)} + (1/2)uTnn* (a0)u|1 + op(1)}y2 > 0 uniformly in ||w|| = C. Using the discussion similar to that in [7], the last two terms on the right-hand side of (C.4) are bounded by ns(y„a„C + y„fc„C ). It then follows from lim„^TOfc„ = 0 that when C is sufficiently large, L* (a0 + ynu) - L* (a0) > 0 uniformly holds in ||w|| = C. Hence, (C.1) holds.

D. Proof of Theorem 3

It follows from Lemma B.3 that part (1) holds. Now we are going to prove part (2). From part (1) we see that the local minimizer of L* (a) has the form of a = (a1, 0T_S) and satisfies a-a0 = 0^(n-1/2). It follows from Taylor's expansion that

9LI («)

= U„* («) + « diag (bT,0T_s)

+ n {diag [{p£ (I^oi I)} , °Lf + op (1)} (« - «o).

li [T Or M-vfr, (i)

+ diag (K„ (l«0il)}w,0L)T + op (1)

-LlJ, («0) + diag (bT,0T_s) + op (n-1/2)}. It follows from the proofs of Lemmas B.2 and B.3 that

-1 (D.2)

» èi Jo

Y„2o (i)-

Y2o (0 Yoo (i)

di + o„ (1),

where Y„H(i) = (1/n) Zr=ilVJ(Wi)}8fcA'(t | Zi,Vi,Wi)yi(i) for fc = 0,1,2, I = 0,1. Lemma B.1 and Assumptions (A.v) and (A.vi) yield

1 M fr 2

-I I {v; (Wi) - vf Yt (i) di = I (r) + (1). (D.4) n J0

Next we consider l/n(a0). Using the martingale central limits theorem, we can prove that l/„(a0) is asymptotically normally distributed with a mean of zero and covariance E[l/„(a0)] . Thus we obtain that

=jm Jo' [ï- H^f^' «> (D.5)

Y"io (i) Y (i) i«11 (f)

di = Z (T) .

Hence, by using Slutsky's Theorem, it follows from (D.2)-(D.5) that

v/n{a 1 - «1o -(I1 (r) + r) 1b}

(I1 (r) + r)-1Z1 (r) (I1 (r) + r)-1) .

E. Proof of Theorem 5

To prove part (a) of Theorem 5,weuse proofbycontradiction. Suppose that for an n sufficiently large there exists a constant such that with a probability of at least ^ there exists a fcc > q such that = 0. Then ||yfco ||fc = ||£ > 0. Let f be

« - «

a vector constructed by replacing yko with 0 in y. Then by the definition of y,

PLn (f)-PLn (?)

= {Ln (r)-Ln (r)]-{Ln (f)-Ln (y)}

+ nPK ( Hrfc.

k0\\k„

= n{(y- y) En (y- y)- (y* - y) En (y* - y)}

x(l + 0p (l)) + npK Qi^\\kt).

It then follows from Lemmas B.4-B.6 that the first term on the right-hand side of (E.1) is an order of O,(n

(max{(AnPn)1l2,a*n,(LmJn)112})2). Notice that pK(\\fko )

= Pi W^o "ko (1+0,(1)) = p'K (0)Op(max{(XnPn)ll2,a^,

(LmJn)112}). Hence,

PLn (y)-PLn (f)

= nOp (( max

(AnPn) ,af '

+ np'K (0) Op (max -

(AnPn) , an

1/2 fL ^1/2

1/2 „* max

Op ( Max

nPn Max

+p'x„ (0)AJ

x nOp ( max

1/2 * I ^max m >a,

Because An — ° and Xn/ max{pn,a*,(Ln/n) 1 } — to as n —> to, (13) implies that PLn(y) - PLn(y*) > 0 with probability tending to 1. This contradicts the fact that PLn(y)-PLn(y*) < 0. Thus, part (a) holds.

Lemmas B.5 and B.6 yield the proof of part (b).

F. Proof of Theorem 8

Note that

is the conditional mean of (24) given all observed covariates {(X, Zt, W{ )}1&iSn and Yi (t), i = 1,..., n. It follows from (24) that the expression given in (F.1) equals zero. Hence, together with (24), we have

ni [ K (W)- U(1) (t)}"dMt (t)(i + Op (n-112))

dpx (||y,

ht 10 K (W)- U(1) <2y - y(1)}^

By using similar quadratic approximation in [10], we have Px (Wr^

~PxmK) 2\ML2(\\fk\\k-m\L2)'

t\dPA (Wr>

dpx (\\%

= diag

P'x (WßkL)

2 \\Pkl2

x(f ) - yW)(l+op (1))

= L-1r (y(1) - r )(l + Op (1)).

Finally, by using the arguments similar to the proof of Theorem 4.1 of [26] and the proof of [24], (F.2) and the martingale central theory, we know that

fL-1 [dl(Ei + T*)-1n*(Ei +T*)-1dn}-1/2 xdl tf1 -y(1))-^N(0,D

holds for any vector dn with dimension Zk=o Lk and components not all 0. Then for any (q + 1)-dimension vector dn whose components are not all 0 and for any given w in W, choosing d* = (B(1)(w))Tdn yields

■fL- {d*nrB(1) (w)(Ei +T*)-1n*(Ei +r*)

x(B(1) (W))Td*n

t dPA (\\f,

k ¿r(1)

n tl}0

Wl) (Wj) - ß(1 (W,)\dt

Z?)TY, (t)

xdn1 {ßm (w)- ß(1) (W)\-^N(0,1),

which implies (25) holds.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

Ma's research was supported by the National Natural Science

Foundation of China (NSFC) Grant no. 11301424 and the

Fundamental Research Funds for the Central Universities

Grant no. JBK120405.

References

[1] J. Huang, "Efficient estimation of the partly linear additive Cox model," The Annals of Statistics, vol. 27, no. 5, pp. 1536-1563, 1999.

[2] J. Cai, J. Fan, J. Jiang, and H. Zhou, "Partially linear hazard regression for multivariate survival data," Journal of the American Statistical Association, vol. 102, no. 478, pp. 538-551, 2007.

[3] J. Fan, H. Lin, and Y. Zhou, "Local partial-likelihood estimation for lifetime data," The Annals of Statistics, vol. 34, no. 1, pp. 290325, 2006.

[4] L. Tian, D. Zucker, and L. J. Wei, "On the Cox model with time-varying regression coefficients," Journal of the American Statistical Association, vol. 100, no. 469, pp. 172-183, 2005.

[5] G. S. Yin, H. Li, and D. L. Zeng, "Partially linear additive hazards regression with varying coefficients," Journal of the American Statistical Association, vol. 103, no. 483, pp. 1200-1213, 2008.

[6] J. Fan and R. Li, "Variable selection for Cox's proportional hazards model and frailty model," The Annals of Statistics, vol. 30, no. 1, pp. 74-99, 2002.

[7] J. Fan and R. Li, "Variable selection via nonconcave penalized likelihood and its oracle properties," Journal of the American Statistical Association, vol. 96, no. 456, pp. 1348-1360, 2001.

[8] B. A. Johnson, D. Y. Lin, and D. Zeng, "Penalized estimating functions and variable selection in semiparametric regression models," Journal oftheAmerican Statistical Association, vol. 103, no. 482, pp. 672-680, 2008.

[9] B. A. Johnson, "Variable selection in semiparametric linear regression with censored data," Journal of the Royal Statistical Society B. Statistical Methodology, vol. 70, no. 2, pp. 351-370, 2008.

[10] L. Wang, H. Li, and J. Z. Huang, "Variable selection in non-parametric varying-coefficient models for analysis of repeated measurements," Journal oftheAmerican Statistical Association, vol. 103, no. 484, pp. 1556-1569, 2008.

[11] P. Du, S. Ma, and H. Liang, "Penalized variable selection procedure for Cox models with semiparametric relative risk," The Annals of Statistics, vol. 38, no. 4, pp. 2092-2117, 2010.

[12] D. Y. Lin and Z. Ying, "Semiparametric analysis of the additive risk model," Biometrika, vol. 81, no. 1, pp. 61-71,1994.

[13] F. W. Huffer and I. W. McKeague, "Weighted least squares estimation for aalen's additive risk model," Journal of the American Statistical Association, vol. 86, no. 413, pp. 114-129, 1991.

[14] O. O. Aalen, "A linear regression model for the analysis of life times," Statistics in Medicine, vol. 8, no. 8, pp. 907-925,1989.

[15] T. H. Scheike, "The additive nonparametric and semiparametric Aalen model as the rate function for a counting process," Lifetime Data Analysis, vol. 8, no. 3, pp. 247-262, 2002.

[16] I. W. McKeague and P. D. Sasieni, "A partly parametric additive risk model," Biometrika, vol. 81, no. 3, pp. 501-514,1994.

[17] H. Li, G. Yin, and Y. Zhou, "Local likelihood with time-varying additive hazards model," The Canadian Journal of Statistics, vol. 35, no. 2, pp. 321-337, 2007.

[18] J. D. Kalbfleisch and R. L. Prentice, The Statistical Analysis of Failure Time Data, John Wiley & Sons, Hoboken, NJ, USA, 2nd edition, 2002.

[19] J. Fan and I. Gijbels, Local Polynomial Modelling and Its Applications: Monographs on Statistics and Applied Probability, Chapman & Hall, London, UK, 1996.

[20] L. L. Schumaker, Spline Functions: Basic Theory, Cambridge University Press, Cambridge, UK, 3rd edition, 2007.

[21] C. J. Stone, "Optimal global rates of convergence for nonparametric regression," The Annals of Statistics, vol. 10, no. 4, pp. 1040-1053,1982.

[22] J. Fan and T. Huang, "Profile likelihood inferences on semiparametric varying-coefficient partially linear models," Bernoulli, vol. 11, no. 6, pp. 1031-1057, 2005.

[23] W. A. Knaus, F. E. Harrell Jr., J. Lynn et al., "The SUPPORT prognostic model. Objective estimates of survival for seriously ill hospitalized adults," Annals of Internal Medicine, vol. 122, no. 3, pp. 191-203,1995.

[24] J. Z. Huang, C. O. Wu, and L. Zhou, "Polynomial spline estimation and inference for varying coefficient models with longitudinal data," Statistica Sinica, vol. 14, no. 3, pp. 763-788, 2004.

[25] J. Cai, J. Fan, H. Zhou, and Y. Zhou, "Hazard models with varying coefficients for multivariate failure time data," The Annals of Statistics, vol. 35, no. 1, pp. 324-354, 2007.

[26] J. Z. Huang, "Local asymptotics for polynomial spline regression," The Annals of Statistics, vol. 31, no. 5, pp. 1600-1635,2003.

Copyright of Journal of Applied Mathematics is the property of Hindawi Publishing Corporation and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.