Handbook of Econometrics Vols1-5 _ Chapter 43

pdf
Số trang Handbook of Econometrics Vols1-5 _ Chapter 43 24 Cỡ tệp Handbook of Econometrics Vols1-5 _ Chapter 43 1 MB Lượt tải Handbook of Econometrics Vols1-5 _ Chapter 43 0 Lượt đọc Handbook of Econometrics Vols1-5 _ Chapter 43 0
Đánh giá Handbook of Econometrics Vols1-5 _ Chapter 43
4.3 ( 6 lượt)
Nhấn vào bên dưới để tải tài liệu
Đang xem trước 10 trên tổng 24 trang, để tải xuống xem đầy đủ hãy nhấn vào bên trên
Chủ đề liên quan

Nội dung

Chapter 43 ANALOG CHARLES University ESTIMATION OF ECONOMETRIC MODELS* F. MANSKI ofWisconsin-Madison Contents 2560 2560 2561 Abstract 1. Introduction 2. Preliminaries 2.1. The analogy 2.2. Moment 4. 5. 2563 problems 2565 models Method-of-moments estimation 2.3. 3. 2561 principle Econometric of separable models 2566 3.1. Mean independence 2561 3.2. Median 2568 3.3. Conditional 3.4. Variance 3.5. Statistical 3.6. A historical independence 2510 independence 2570 independence 2571 note Method-of-moments 4.1. Likelihood 4.2. Invertible 4.3. Mean independent 4.4. Quantile Estimation 2569 symmetry estimation of response models 2512 models 2574 models 2514 linear models models separable and response independent 2575 monotone of general 5.1. Closest-empirical-distribution estimation 5.2. Minimum-distance of response estimation of separable models models models 2577 2517 2580 2581 2581 Conclusion References 6. *I am grateful 2571 for the comments of Rosa Matzkin and Jim Powell. Handbook of Econometrics, Volume IV, Edited by R.F. Engle and D.L. McFadden 0 1994 Elsevier Science B. V. All rights reserved C.F. Manski 2560 Abstract Suppose that one wants to estimate a parameter characterizing some feature of a specified population. One has some prior information about the population and a random sample of observations. A widely applicable approach is to estimate the parameter by a sample analog; that is, by a statistic having the same properties in the sample as the parameter does in the population. If there is no such statistic, then one may choose an estimate that, in some well-defined sense, makes the known properties of the population hold as closely as possible in the sample. These are analog estimation methods. This chapter surveys some uses of analog methods to estimate two classes ofeconometric models, the separable and the response models. 1. Introduction Suppose that one wants to estimate a parameter characterizing some feature of a specified population. One has some prior information about the population and a random sample of observations. A widely applicable approach is to estimate the parameter by a sample analog; that is, by a statistic having the same properties in the sample as the parameter does in the population. If there is no such statistic, then one may choose an estimate that, in some well-defined sense, makes the known properties of the population hold as closely as possible in the sample. These are analog estimation methods. Familiar examples include use of the sample average to estimate the population mean and sample quantiles to estimate population quantiles. The classical method of moments (Pearson (1894)) is an analog approach, as is minimum chi-square estimation (Neyman (1949)). Maximum likelihood, least squares and least absolute deviations estimation are analog methods. This chapter surveys some uses of analog methods to estimate econometric models. Section 2 presents the necessary preliminaries, defining the analogy principle, moment problems and the method of moments, and two classes of models, the separable and the response models. Sections 3 and 4 describe the variety of separable and response models that imply moment problems and may be estimated by the method of moments. Section 5 discusses two more general analog estimation approaches: closest empirical distribution estimation of separable models and minimum distance estimation of response models. Section 6 gives conclusions. The reader wishing a more thorough treatment of much of the material in this chapter should see Manski (1988). The analogy principle is used here to estimate population parameters. Other chapters of this handbook exploit related ideas for other purposes. The chapter by Hall describes bootstrap methods, which apply the analogy principle to approxi- Ch. 43: Analoy Estimation mate the distribution describes simulation and a pseudo-sample values. 2. 2.1. of Econometric Models 2561 of sample statistics. The chapter by Hajivassiliou and Ruud methods, which use the analogy between an observed sample from the same population, drawn at postulated parameter Preliminaries The analogy principle Assume that a probability distribution P on a sample space 2 characterizes a population. One observes a sample of N independent realizations of a random variable z distributed P. One knows that P is a member of some family 17 of probability distributions on 2. One also knows that a parameter b in a parameter space B solves an equation T(P, b) = 0, (1) where T(., *) is a given function mapping 17 x B into some vector space Y. The problem is to combine the sample data with the knowledge that DEB, PEJI and T(P, b) = 0 so as to estimate b. Many econometric models imply that a parameter solves an extremum problem rather than an equation. We can use (1) to express extremum problems by saying that b solves b - argmin W(P, c) = 0. COB (2) Here W(., .) is a given function mapping 17 x B into the real line. Let P, be the empirical distribution of the sample of N draws from P. That is, P, is the multinomial probability distribution that places probability l/N on each of the N observations of z. The group of theorems collectively referred to as the laws of large numbers show that P, converges to P in various senses as N --) co. This suggests that to estimate b one might substitute the function T(P,, .) for T(P, .) and use B, = [cEB: T(P,, c) = 01. (3) This defines the analog estimate when P, is a feasible value for P; that is, when P,EIZ. In these cases T(P,;) is well-defined and has at least one zero in B, so B, is the (possibly set-valued) analog estimate of b. Equation (3) does not explain how to proceed when P,#I7. We have so far defined T(.;) only on the space ZZ x B of feasible population distributions and parameter values. The function T(P,, .) is as yet undefined for P,#I7. C.F. Manski 2562 Let @ denote the space of all multinomial distributions on Z. To define T(P,, .) for every sample size and all sample realizations, it suffices to extend T(., .) from 17 x B to the domain (n u @) x B. Two approaches have proved useful in practice. Mapping P, into 17. One approach is to map P, into 17. Select a function rc(.): Hu @ + 17 which maps every member of 17 into itself. Now replace the equation T(P, b) = 0 with T[rc(P), b] = 0. (4) This substitution leaves the estimation problem unchanged as T[rr(Q), .] = r(Q, .) for all Q~17. Moreover, n(P,)~17; so T[rc(P,);] is defined and has a zero in B. The analogy principle applied to (4) yields the estimate B,, = [CER T{7c(P,), c>= 01. (5) When P,EII, this estimate is the same as the one defined in equation (3). When P,$II, the estimate (5) depends on the selected function rc(.); hence we write B,, rather than B,. A prominent example of this approach is kernel estimation of Lebesgue density functions. Let n be the space of distributions having Lebesgue densities. The empirical distribution P, is multinomial and so is not in 17. But P, can be smoothed so as to yield a distribution that is in 17. In particular, the convolution of P, with any element of 17 is itself an element of 17. The density of the convolution is a kernel density estimate. See Manski (1988), Chapter 2. Direct extension. Sometimes there is a natural direct way to extend the domain of T(., .), so T(P,, .) is well-defined. Whenever T(P,, .) has a zero in B, equation (3) gives the analog estimate. If P, is not in l7, it may be that T(P,, c) # 0 for all CEB. Then the analogy principle suggests selection of an estimate that makes T(P,, .) as close as possible to zero in some sense. To put this idea into practice, select an origin-preserving function r(.) which maps values of T(., .) into the non-negative real half line. That is, let I(.): Y + [0, co), with T = O-r(T) = 0. Now replace the equation T(P, b) = 0 with the extremum problem min r[ T(P, c)]. CSB (6) This substitution leaves the estimation problem unchanged as T(Q,c) = 00 r[T(Q, c)] = 0 for (Q, c)~17 x B. To estimate b, solve the sample analog of (6). Provided only that r[T(P,;)] attains its minimum on B, the analog estimate is B,, = argmin r[T(P,, ceB c)]. (7) Ch. 43: Analog Estimation of Econometric 2563 Models If P,~17, this estimate is the same as the one defined in (3). If PN$17 but T(P,, .) has a zero in B, the estimate remains as in (3). If T(P,;) is everywhere non-zero, the estimate depends on the selected function r(.); hence we write B,, rather than B,. Section 2.2 describes an extraordinarily useful application of this approach, the method of moments. 2.2. Moment problems Much of present-day econometrics is concerned with estimation of a parameter b solving an equation of the form s g(z, b) dP = 0 (8) or an extremum problem of the form min h(z, c) dP. CEBs (9) In (8), g(., .) is a given function mapping 2 x B into a real vector space. In (9), h(., .) is a given function mapping 2 x B into the real line. Numerous prominent examples of (8) and (9) will be given in Sections 3 and 4 respectively. When PN~n, application of the analogy principle to (8) and (9) yields the estimates I[ g(z, c)dP, = 0 = B, = argmin CEB I1 CEB:; $ g(z, c) = 0 h(z, c) dP, = argE,n i .$ h(z, c), I 1 I (10) (11) where (zi,i= l,..., N) are the sample observations of z. When P,$ZIl, one might either map P, into ZZor extend the domain of T(.;) directly. The latter approach is simplest; the sample analogs of the expectations jg(z, .) dP and J h(z, .) dP are the sample averages jg(z, .)dP, and I&, *)dP,. So (10) and (11) remain analog estimates of the parameters solving (8) and (9). It remains only to consider the possibility that the estimates may not exist. In applications, s h(z, .) dP, generally has a minimum. On the other hand, jg(z, *)dP, often has no zero. In that case, one may select an origin-preserving transformation r(.) and replace (8) with the problem of minimizing r[J g(z, .) dP], as was done in (6). 2564 C.F. Manski Minimizing the sample apalog yields (12) Estimation problems relating b to P by (8) or (9) are called moment problems. Estimates of the forms (lo), (1 l), and (12) are method-of-moments estimates. Use of the term “moment” rather than the equally descriptive “expectation,” “mean,” or “integral” honors the early work of K. Pearson on the method of moments. Clearly, consistent estimation of b of method-of-moments estimates. requires that the asserted moment problem has a unique solution; that is, b must be identified. If no solution exists, the estimation problem has been misspecified and b is not defined. If there are multiple solutions, sample data cannot possibly distinguish between them. There is no general approach for determining the number of solutions to equation systems of the form (8) or to extremum problems of the form (9). One must proceed more or less case-by-case. Given identification, method-of-moments estimates are consistent if the estimation problem is sufficiently regular. Rigorous treatments appear in such econometrics texts as Amemiya (1985), Gallant (1987) and Manski (1988). I provide here an heuristic explanation focussing on (12); case (11) involves no additional considerations. We are concerned with the behavior of the function r[Jg(z, *)dP,] as N + co. The strong law of large numbers implies that for all CEB, s g(z, c) dP, + j g(z, c) dP as N + co, almbst surely. The convergence is uniform on B if the parameter space is sufficiently small, the function g(.;) sufficiently smooth, and the distribution P sufficiently well-behaved. (For example, it suffices for B to be a compact finitedimensional set, for J (g(z, *)1dP to be bounded by an integrable function D(z), and for g(z;) to be continuous on B. See Manski (1988), Chapter 7.) If the convergence is uniform and I( .) is smooth, then as N + cc the minima on B of I [ J g(z, .) dP,] tend to occur increasingly near the minima of r[Jg(.z, .) dP]. The unique minimum of r[Jg(z, .) dP] occurs at b. So the estimate B,, converges to b. Uniform convergence on B of Jg(z, .) dP, to Jg(z, .)dP is close to a necessary condition for consistency of method-of-moments estimates. If this condition is seriously violated, s g(z, .) dP, is not a good sample analog to jg(z, .) dP and the estimation approach does not work. Beginning in the 1930s with the GlivenkoCantelli Theorem, statisticians and econometricians have steadily broadened the range of specifications of B, g(., *) and P for which uniform laws of large numbers have been shown to hold (e.g. Pollard (1984) and Andrews (1987)). Nevertheless, uniformity does break down in situations that are far from pathological. Perhaps the most important practical concern is the size of the parameter space. Given a specification for g(., .) and for P, uniformity becomes a more demanding property as B becomes larger. Consistency Ch. 43: Analog Estimation of Econometric Models 2565 Sampling distributions. The exact sampling distributions of method-of-moments estimates are generally complicated. Hence the practice is to invoke local asymptotic approximations. If the parameter space is finite-dimensional and the estimation problem is sufficiently regular, a method-of-moments estimate B,, converges at rate as a limiting normal distribution centered at zero. Alterl/,,&andfl(B,,-b)h native estimates of a given parameter may have limiting distributions with different variances. This fact suggests use of the variance of the limiting distribution as a criterion for measuring precision. Comparison of the precision of alternative estimators has long engaged the attention of econometric theorists. An estimate is termed asymptotic efficient if the variance of the limiting normal distribution of @(B,, - b) is the smallest possible given the available prior information. Hansen (1982) and Chamberlain (1987) provide the central findings on the efficiency of method-of-moments estimates. For an exposition, see Manski (1988), Chapters 8 and 9. Non-random sampling. In discussing moment problems and estimation problems more generally, I have assumed that the data are a random sample. It is important to understand that random sampling, albeit a useful simplifying idea, is not essential to the success of analog estimation. The essential requirement is that the sampling process be such that relevant features of the empirical distribution converge to corresponding population features. For example, consider stationary time series problems. Here the data are observations at N dates from a single realization of a stationary stochastic process whose marginal distribution is P. So we do not have a random sample from P. Nevertheless, dependent sampling versions of the laws of large numbers show that P, converges to P in various senses as N -+ co. 2.3. Econometric models We have been discussing an estimation problem relating a parameter b to a probability distribution P generating realizations of an observable random variable z. Econometric models typically relate a parameter b to realizations of the observable z and of an unobservable random variable, say u. Analog estimation methods may be used to estimate b if one can transform the econometric model into a representation relating b to P and to nuisance parameters. Formally, suppose that a probability distribution P,, on a space Z x U characterizes a population. A random sample of N realizations of a random variable (z, u) distributed P,, is drawn and one observes the realizations of z but not of U. One knows that P,, is a member of some family D,, of probability distributions on Z x U. One also knows that a parameter b in a parameter space B solves an equation f(z, u,b) = 0, (13) C.F. Manski 2566 where f(., ., .) maps Z x U x B into some vector space. Equation (13) is to be interpreted as saying that almost every realization (i, q) of (z, U) satisfies the equation f(i, yl, b) = 0. Equation (13) typically has no content in the absence of information on the probability distribution P,, generating (z, u). A meaningful model combines (13) with some distributional knowledge. The practice has been to impose restrictions on the probability distribution of u conditional on some function of z, say x = x(z) taking values in a space X. Let P,lx denote this conditional distribution. Then a model is defined by equation (13) and by a restriction on the conditional distributions (P,l5,5EX). Essentially all econometric research has specified f to have one of two forms. A separable model makes the unobserved variable u additively separable, so that where u,(.;) maps Z x B into U. A response and makes f have the form f(Y, x9 UTb)= Y - model defines z = (y, x), Z = Y x X, u,b), Y,k (15) where y,(., ., .) maps X x U x B into Y. Functional forms (14) and (15) are not mutually exclusive. Some models can be written both ways. The next two sections survey the many separable and response models implying that b and a nuisance parameter together solve a moment problem. (The nuisance parameter characterizes unrestricted features of P,Ix). These models may be estimated by the method of moments if the parameter space is not too large. 3. Method-of-moments estimation of separable models Separable models suppose through an equation u,(z, b) = u. that realizations of (z, U) are related to the parameter b (16) In the absence of information restricting the distribution of the unobserved U, this equation simply defines u and conveys no information about b. In the presence of various distributional restrictions (16) implies that b and a nuisance parameter solve a type of moment equation known as an orthogonality condition, defined here. Orthogonality conditions. Let x = x(z) take values in a real vector space X. Let r denote a space in which a nuisance parameter y lives. Let e(*, .) be a function mapping U x r into a real vector space. Let e(.;)’ denote the transpose of the 2561 Ch. 43: Analog Estimation of Econometric Models column vector e(., .). The random vectors x and e(u, y) are orthogonal if s xe(u, y)‘dP,, = 0. (17) Equation (17) relates the observed random variable x to the unobserved random variable u. Suppose that (16) holds. Then we can replace u in (17) with u,(z, b), yielding xe[u,(z, b), y]‘dP = 0. (18) This orthogonality condition is a moment the distribution P of the observable z. It is not easy to motivate orthogonality show that these conditions are implied by restrictions. The remainder of this section equation relating the parameters (b, y) to conditions directly, but we can readily various more transparent distributional describes the leading cases. Mean independence 3.1. The classical econometric literature on instrumental variables estimation is concerned with separable models in which x and u are known to be uncorrelated. Let y be the mean of U. Zero covariance is the orthogonality condition x(u - Y)‘dP,, = s x[u,(z, b) - y]‘dP = 0. (19) s Most authors incorporate the nuisance by giving that function a free intercept. and equation (19) is rewritten as s x[u,(z, b)]‘dP = 0. parameter y into the specification of u,(., .) This done, u is declared to have mean zero (20) To facilitate discussion of a variety of distributional restrictions, I shall keep y explicit. Zero covariance is sometimes asserted directly, to express a belief that the random variables x and u ,are unrelated. It is preferable to think of zero covariance as following from a stronger form of unrelatedness. This is the mean-independence condition s udP,lt = y, VEX. (21) C.F. 2568 Manski Mean independence implies zero covariance but it is difficult to motivate zero To see why, rewrite (19) as the covariance in the absence of mean independence. iterated expectation r J x(u - y)‘dP,, = r rr 1 x (u - Y)‘dP,jx dP, = 0. J J LJ (22) This shows that mean independence implies zero covariance. It also shows that x and u are uncorrelated if positive and negative realizations of x[(u - Y)‘dP,(x balance when weighted by the distribution of x. But one rarely has information about P,, certainly not information that would make one confident in (22) in the absence of (21). Hence, an assertion of zero covariance suggests a belief that x and u are unrelated in the sense of mean independence. Mean independence implies orthogonality conditions beyond (19). Let u(.) be any function mapping X into a real vector space. It follows from (16) and (21) that v(x) [u,(z, h) - y]’ dP = s (u - y)’ dP, 1x v(x) s [S 1 dP, = 0, (23) provided only that the integral in (23) exists. So the random variables u(x) and u,(z, b) are uncorrelated. In other words, all functions ofx are instrumental variables. Median independence 3.2. The assertion that u is mean independent of x expresses a belief that u has the same central tendency conditional on each realization of x. Median independence offers another way to express this belief. Median independence alone does not imply an orthogonality condition, but it does when the conditional distributions P,l& VEX are componentwise continuous. Let U be the real line; the vector case introduces no new considerations as we shall deal with u componentwise. For each 5 in X, let mg be the median of u conditional on the event [x = 41. Let y be the unconditional median of U. We say that u is median independent of x if m<=Y, (EX. (24) It can be shown (see Manski (1988), Chapter 4) that if P,/& [EX are continuous probability distributions, their medians solve the conditional moment equations sgn(u - mJdP,(t s = 0, [EX. (25)
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.