Handbook of Econometrics Vols1-5 _ Chapter 41

pdf
Số trang Handbook of Econometrics Vols1-5 _ Chapter 41 79 Cỡ tệp Handbook of Econometrics Vols1-5 _ Chapter 41 9 MB Lượt tải Handbook of Econometrics Vols1-5 _ Chapter 41 0 Lượt đọc Handbook of Econometrics Vols1-5 _ Chapter 41 0
Đánh giá Handbook of Econometrics Vols1-5 _ Chapter 41
4.6 ( 8 lượt)
Nhấn vào bên dưới để tải tài liệu
Đang xem trước 10 trên tổng 79 trang, để tải xuống xem đầy đủ hãy nhấn vào bên trên
Chủ đề liên quan

Nội dung

Chapter 41 ESTIMATION MODELS* JAMES OF SEMIPARAMETRIC L. POWELL Princeton Unioersity Contents 2444 2444 Abstract 1. Introduction 2. 3. 2444 1.1. Overview 1.2. Definition of “semiparametric” 1.3. Stochastic restrictions 1.4. Objectives and techniques Stochastic and structural 2449 2452 models of asymptotic 2460 theory 2465 restrictions 2.1. Conditional mean restriction 2466 2.2. Conditional quantile 2469 2.3. Conditional symmetry 2.4. Independence 2.5. Exclusion Structural restrictions 2416 restrictions 2482 and index restrictions 2487 models 3.1. Discrete 3.2. Transformation 3.3. Censored and truncated 3.4. Selection models 3.5. Nonlinear 4. Summary References 2414 restrictions response 2487 models 2492 models regression 2500 models 2506 2511 panel data models 2513 2514 and conclusions *This work was supported by NSF Grants 91-96185 and 92-10101 to Princeton University. I am grateful to Hyungtaik Ahn, Moshe Buchinsky, Gary Chamberlain, Songnian Chen, Gregory Chow, Angus Deaton, Bo Honor&, Joel Horowitz, Oliver Linton, Robin Lumsdaine, Chuck Manski, Rosa Ma&kin, Dan McFadden, Whitney Newey, Paul Ruud, and Tom Stoker for their helpful suggestions, which were generally adopted except when they were mutually contradictory or required a lot of extra work. Handbook of Econometrics, Volume IV, Edited by R.F. En& 0 1994 Elseuier Science B.V. All rights reserved and D.L. McFadden J.L. Powell 2444 Abstract A semiparametric model for observational data combines a parametric form for some component of the data generating process (usually the behavioral relation between the dependent and explanatory variables) with weak nonparametric restrictions on the remainder of the model (usually the distribution of the unobservable errors). This chapter surveys some of the recent literature on semiparametric methods, emphasizing microeconometric applications using limited dependent variable models. An introductory section defines semiparametric models more precisely and reviews the techniques used to derive the large-sample properties of the corresponding estimation methods. The next section describes a number of weak restrictions on error distributions ~ conditional mean, conditional quantile, conditional symmetry, independence, and index restrictions - and show how they can be used to derive identifying restrictions on the distributions of observables. This general discussion is followed by a survey of a number of specific estimators proposed for particular econometric models, and the chapter concludes with a brief account of applications of these methods in practice. 1. 1.l. Introduction Overview Semiparametric modelling is, as its name suggests, a hybrid of the parametric and nonparametric approaches to construction, fitting, and validation of statistical models. To place semiparametric methods in context, it is useful to review the way these other approaches are used to address a generic microeconometric problem ~ namely, determination of the relationship of a dependent variable (or variables) y to a set of conditioning variables x given a random sample {zi = (yi, Xi), i = 1,. . . , N} of observations on y and x. This would be considered a “micro’‘-econometric problem because the observations are mutually independent and the dimension of the conditioning variables x is finite and fixed. In a “macro’‘-econometric application using time series data, the analysis must also account for possible serial dependence in the observations, which is usually straightforward, and a growing or infinite number of conditioning variables, e.g. past values of the dependent variable y, which may be more difficult to accommodate. Even for microeconometric analyses of cross-sectional data, distributional heterogeneity and dependence due to clustering and stratification must often be considered; still, while the random sampling assumption may not be typical, it is a useful simplification, and adaptation of statistical methods to non-random sampling is usually straightforward. In the classical parametric approach to this problem, it is typically assumed that the dependent variable is functionally dependent on the conditioning variables 2445 Ch. 41: Estimation of Semiparametric Models (“regressors”) and unobservable of the form “errors” according to a fixed structural relation (1.1) Y = g(x, @o,s), where the structural function g(.) is known but the finite-dimensional parameter vector a,~Iwp and the error term E are unobserved. The form of g(.) is chosen to give a class of simple and interpretable data generating mechanisms which embody the relevant restrictions imposed by the characteristics of the data (e.g. g(‘) is dichotomous if y is binary) and/or economic theory (monotonicity, homotheticity, etc.). The error terms E are introduced to account for the lack of perfect fit of (1.1) for any fixed value of c1eand a, and are variously interpreted as expectational or optimization errors, measurement errors, unobserved differences in tastes or technology, or other omitted or unobserved conditioning variables; their interpretation influences the way they are incorporated into the structural function 9(.). To prevent (1.1) from holding tautologically for any value of ao, the stochastic behavior of the error terms must be restricted. The parametric approach takes the error distribution to belong to a finite-dimensional family of distributions, a Pr{s d nix} = f,(a Ix, ~0) dl,, (1.2) s -CO where f(.) is a known density (with respect to the dominating measure p,) except for an unknown, finite-dimensional “nuisance” parameter ‘lo. Given the assumed structural model (1.1) and the conditional error distribution (1.2), the conditional distribution of y given x can be derived, s 1 Pr{y < 11~) = -a, 1b d ~hf,,,(uI x, %I, qo) dpYIX, for some parametric conditional density f,,,,(.). Of course, it is usually possible to posit this conditional distribution of y given x directly, without recourse to unobservable “error” terms, but the adequacy of an assumed functional form is generally assessed with reference to an implicit structural model. In any case, with this conditional density, the unknown parameters c(~and q. can be estimated by maximizing the average conditional log-likelihood This fully parametric modelling strategy has a number of well-known optimality properties. If the specifications of the structural equation (1.1) and error distribution (1.2) are correct (and other mild regularity conditions hold), the maximum likelihood estimators of ~1~and ‘lo will converge to the true parameters at the rate of the inverse square root of the sample size (“root-N-consistent”) and will be asymptotically normally distributed, with an asymptotic covariance matrix which is no larger than that of any other regular root-N-consistent estimator. Moreover, the parameter estimates yield a precise estimator of the conditional distribution of the dependent variable given the regressors, which might be used to predict y for values of x which fall outside the observed support of the regressors. The drawback to parametric modelling is the requirement that both the structural model and the error distribution are correctly specified. Correct specification may be particularly difficult for the error distribution, which represents the unpredictable component of the relation of y to x. Unfortunately, if g(x, u, E) is fundamentally nonlinear in E - that is, it is noninvertible in E or has a Jacobian that depends on the unknown parameters tl - then misspecification of the functional form of the error distribution f(slx, 9) generally yields inconsistency of the MLE and inconsistent estimates of the conditional distribution of y given x. At the other extreme, a fully nonparametric approach to modelling the relation between y and x would define any such “relation” as a characteristic of the joint distribution of y and x, which would be the primitive object of interest. A “causal” or predictive relation from the regressors to the dependent variable would be given as a particular functional of the conditional distribution of y given x, g(x) = WY,,), (1.3) where F,,, is the joint and F,tx is the conditional distribution. Usually the functional T(.) is a location measure, in which case the relation between y and x has a representation analogous to (1.1) and (1.2), but with unknown functional forms for f( .) and g(.). For example, if g(x) is the mean regression function (T(F,,,) = E[y 1x]), then y can be written as Y = g(x) + E, with E defined to have conditional density f,,, assumed to satisfy only the normalization E[E[x] = 0. In this approach the interpretation of the error term E is different than for the parametric approach; its stochastic properties derive from its definition in terms of the functional g(.) rather than a prior behavioral assumption. Estimation of the function g(.) is straightforward once a suitable estimator gYIX of the conditional distribution of y given x is obtained; if the functional T(.) in (1.3) is well-behaved (i.e. continuous over the space of possible I’&, a natural estimator is 9(x) = ~(~y,,). Thus the problem of estimating the “relationship” g(.) reduces to the problem of estimating the conditional distribution function, which generally requires some smoothing across adjacent observations of the regressors x when some components Ch. 41: Estimation of Semiparametric Models 2441 are continuously distributed (see, e.g. Prakasa Rao (1983) Silverman (1986), Bierens (1987), Hardle (1991)). In some cases, the functional T(.) might be a well-defined functional of the empirical c.d.f. of the data (for example, g(x) might be the best linear projection of y on x, which depends only on the covariance matrix of the data); in these cases smoothing of the empirical c.d.f. will not be required. An alternative estimation strategy would approximate g(x) and the conditional distribution of E in (1.6) by a sequence of parametric models, with the number of parameters expanding as the sample size increases; this approach, termed the “method of sieves” by Grenander (1981), is closely related to the “seminonparametric” modelling approach of Gallant (1981, 1987), Elbadawi et al. (1983) and Gallant and Nychka (1987). The advantages and disadvantages of the nonparametric approach are the opposite of those for parametric modelling. Nonparametric modelling typically imposes few restrictions on the form of the joint distribution of the data (like smoothness or monotonicity), so there is little room for misspecification, and consistency of an estimator of g(x) is established under much more general conditions than for parametric modelling. On the other hand, the precision of estimators which impose only nonparametric restrictions is often poor. When estimation of g(x) requires smoothing of the empirical c.d.f. of the data, the convergence rate of the estimator is usually slower than the parametric rate (square root of the sample size), due to the bias caused by the smoothing (see the chapter by Hardle and Linton in this volume). And, although some prior economic restrictions like homotheticity and monotonicity can be incorporated into the nonparametric approach (as described in the chapter by Matzkin in this volume), the definition of the “relation” is statistical, not economic. Extrapolation of the relationship outside the observed support of the regressors is not generally possible with a nonparametric model, which is analogous to a “reduced form” in the classical terminology of simultaneous equations modelling. The semiparametric approach, the subject of this chapter, distinguishes between the “parameters of interest”, which are finite-dimensional, and infinite-dimensional “nuisance parameters”, which are treated nonparametrically. (When the “parameter of interest” is infinite-dimensional, like the baseline hazard in a proportional hazards model, the nonparametric methods described in the Hardle and Linton chapter are more appropriate.) In a typical parametric model, the parameters of interest, mO, appear only in a structural equation analogue to (l.l), while the conditional error distribution is treated as a nuisance parameter, subject to certain prior restrictions. More generally, unknown nuisance functions may also appear in the structural equation. Semiparametric analogues to equations (1.1) and (1.2) are (1.4) 1 {u d A}fo(aIx)dp,, Pr{s d nix} = s (1.5) J.L. Powell 2448 where, as before, CQis unknown but known to lie in a finite-dimensional subspace, and where the unknown nuisance parameter is ‘lo = Euclidean (to(.)J As with the parametric approach, prior economic reasoning general regularity and identification restrictions are imposed on the nuisance parameters qO, as in the nonparametric approach. As a hybrid of the parametric and nonparametric approaches, semiparametric modelling shares the advantages and disadvantages of each. Because it allows a more general specification of the nuisance parameters, estimators of the parameters of interest for semiparametric models are consistent under a broader range of conditions than for parametric models, and these estimators are usually more precise (converging to the true values at the square root of the sample size) than their nonparametric counterparts. On the other hand, estimators for semiparametric models are generally less efficient than maximum likelihood estimators for a correctly-specified parametric model, and are still sensitive to misspecification of the structural function or other parametric components of the model. This chapter will survey the econometric literature on semiparametric estimation, with emphasis on a particular class of models, nonlinear latent variable models, which have been the focus of most of the attention in this literature. The remainder of Section 1 more precisely defines the “semiparametric” categorization, briefly lists the structural functions and error distributions to be considered and reviews the techniques for obtaining large-sample approximations to the distributions of various types of estimators for semiparametric models. The next section discusses how each of the semiparametric restrictions on the behavior of the error terms can be used to construct estimators for certain classes of structural functions. Section 3 then surveys existing results in the econometric literature for several groups of latent variable models, with a variety of error restrictions for each group of structural models. A concluding section summarizes this literature and suggests topics for further work. The coverage of the large literature on semiparametric estimation in this chapter will necessarily be incomplete; fortunately, other general references on the subject are available. A forthcoming monograph by Bickel et al. (1993) discusses much of the work on semiparametrics in the statistical literature, with special attention to construction of efficient estimators; a monograph by Manski (1988b) discusses the analogous econometric literature. Other surveys of the econometric literature include those by Robinson (1988a) and Stoker (1992), the latter giving an extensive treatment of estimation based upon index restrictions, as described in Section 2.5 below. Newey (1990a) surveys the econometric literature on semiparametric efficiency bounds, which is not covered extensively in this chapter. Finally, given the close connection between the semiparametric approach and parametric and say, to different method5 and degrees of “smoothing” of the empirical c.d.f.), while estimation of a semiparametric model would require an additional choice of the particular functional T* upon which to base the estimates. On a related point, while it is common to refer to “semiparametric estimation” Some and “semiparametric estimators”, this is somewhat misleading terminology. authors use the term “semiparametric estimator” to denote a statistic which involves a preliminary “plug-in” estimator of a nonparametric component (see, for example, Andrews’ chapter in this volume); this leads to some semantic ambiguities, since the parameters of many semiparametric models can be estimated by “parametric” estimators and vice versa. Thus, though certain estimators would be hard to interpret in a parametric or nonparametric context, in general the term “semiparametric”, like “parametric” or “nonparametric”, will be used in this chapter to refer to classes of structural models and stochastic restrictions, and not to a particular statistic. In many cases, the same estimator can be viewed as parametric, nonparametric or semiparametric, depending on the assumptions of the model. For example, for the classical linear model y = x’& + E, the least squares estimator PC [ itl xixl 1 -lit1 of the unknown coefficients &, xiYi3 would be considered a “parametric” estimator when the error terms are assumed to be Gaussian with zero mean and distributed independently of the regressors x. With these assumptions fi is the maximum likelihood estimator of PO, and thus is asymptotically efficient relative to all regular estimators of PO. Alternatively, the least squares estimator arises in the context of a linear prediction problem, where the error term E has a density which is assumed to satisfy the unconditional moment restriction E[&.X] = 0. This restriction yields a unique tion of the data, representation for /I0 in terms of the joint distribu- & = {E[x.x'])-'E[x.y], so estimation of /I0 in this context would be considered a “nonparametric” problem by the criteria given above. Though other, less precise estimators of the moments E[x.x’] and E[x.y] (say, based only on a subset of the observations) might be used to define alternative estimators, the classical least squares estimator fi is, al- Ch. 41: Estimation of Semiparametric 2451 Models most by default, an “efficient” estimator of PO in this model (as Levit (1975) makes precise). Finally, the least squares estimator b can be viewed as a special case of the broader class of weighted least squares estimators of PO when the error terms E are assumed to have conditional mean zero, E[.51xi] = 0 a.s. The model defined by this restriction would be considered “semiparametric”, since &, is overidentified; while the least squares estimator b is *-consistent and asymptotically normal for this model (assuming the relevant second moments are finite), it is inefficient in general, with an efficient estimator being based on the representation of the parameters of interest, where a2(x) E Var(sJxi) (as discussed in Section 2.1 below). The least squares statistic fi is a “semiparametric” estimator in this context, due to the restrictions imposed on the model, not on the form of the estimator. Two categories of estimators which are related to “semiparametric estimators”, but logically distinct, are “robust” and “adaptive” estimators. The term “robustness” is used informally to denote statistical procedures which are well-behaved for slight misspecifications of the model. More formally, a robust estimator & - T(p,,,) can be defined as one for which T(F) is a continuous functional at the true model (e.g. Manski (1988b)), or whose asymptotic distribution is continuous at the as defined by Huber (1981)). Other notions of truth (“quantitative robustness”, robustness involve sensitivity of particular estimators to changes in a small fraction of the observations. While “semiparametric estimators” are designed to be well-behaved under weak conditions on the error distribution and other nuisance parameters (which are assumed to be correct), robust estimators are designed to be relatively efficient for correctly-specified models but also relatively insensitive to “slight” model misspecification. As noted in Section 1.4 below, robustness of an estimator is related to the boundedness (and continuity) of its influence function, defined in Section 1.4 below; whether a particular semiparametric model admits a robust estimator depends upon the particular restrictions imposed. For example, for conditional mean restrictions described in Section 2.1 below, the influence functions for semiparametric estimators will be linear (and thus unbounded) functions of the error terms, so robust estimation is infeasible under this restriction. On the other hand, the influence function for estimators under conditional quantile restrictions depends upon the sign of the error terms, so quantile estimators are generally “robust” (at least with respect to outlying errors) as well as “semiparametric”. “Adaptive” estimators are efficient estimators of certain semiparametric models for which the best attainable efficiency for estimation of the parameters of interest J.L. Powell does not depend upon prior knowledge of a parametric form for the nuisance parameters. That is, adaptive estimators are consistent under the semiparametric restrictions but as efficient (asymptotically) as a maximum likelihood estimator when the (infinite-dimensional) nuisance parameter is known to lie in a finitedimensional parametric family. Adaptive estimation is possible only if the semiparametric information bound for attainable efficiency for the parameters of interest is equal to the analogous Cramer-Rao bound for any feasible parametric specification of the nuisance parameter. Adaptive estimators, which are described in more detail by Bickel et al. (1993) and Manski (1988b), involve explicit estimation of (nonparametric) nuisance parameters, as do efficient estimators for semiparametric models more generally. 1.3. Stochastic restrictions and structural models As discussed above, a semiparametric model for the relationship between y and x will be determined by the parametric form of the structural function g(.) of (1.4) and the restrictions imposed on the error distribution and any other infinitedimensional component of the model. The following sections of this chapter group semiparametric models by the restrictions imposed on the error distribution, describing estimation under these restrictions for a number of different structural models. A brief description of the restrictions to be considered, followed by a discussion of the structural models, is given in this section. A semiparametric restriction on E which is quite familiar in econometric theory and practice is a (constant) conditional mean restriction, where it is assumed that -wx) = PO (1.6) for some unknown constant po, which is usually normalized to zero to ensure identification of an intercept term. (Here and throughout, all conditional expectations are assumed to hold for a set of regressors x with probability one.) This restriction is the basis for much of the large-sample theory for least squares and method-of-moments estimation, and estimators derived for assumed Gaussian distributions of E (or, more generally, for error distributions in an exponential family) are often well-behaved under this weaker restriction. A restriction which is less familiar but gaining increasing attention in econometric practice is a (constant) conditional quantile restriction, under which a scalar error term E is assumed to satisfy Pr{c d qolx} = 71 for some fixed proportion restriction is the (leading) (1.7) rr~(O, 1) and constant q. = qo(n); a conditional median special case with n= l/2. Rewriting the conditional
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.