Báo cáo khoa học: "The Semantics of Grammar Formalisms Seen as Computer Languages"

pdf
Số trang Báo cáo khoa học: "The Semantics of Grammar Formalisms Seen as Computer Languages" 7 Cỡ tệp Báo cáo khoa học: "The Semantics of Grammar Formalisms Seen as Computer Languages" 640 KB Lượt tải Báo cáo khoa học: "The Semantics of Grammar Formalisms Seen as Computer Languages" 0 Lượt đọc Báo cáo khoa học: "The Semantics of Grammar Formalisms Seen as Computer Languages" 0
Đánh giá Báo cáo khoa học: "The Semantics of Grammar Formalisms Seen as Computer Languages"
5 ( 12 lượt)
Nhấn vào bên dưới để tải tài liệu
Để tải xuống xem đầy đủ hãy nhấn vào bên trên
Chủ đề liên quan

Nội dung

The Semantics of Grammar Formalisms S e e n as C o m p u t e r L a n g u a g e s Fernando C. N. Pereira and Stuart M. Shieber Artificial Intelligence Center SRI International and Center for the Study of Language and Information Stanford University Abstract a declarative semantics z extended in the natural way from the declarative semantics of context-free grammars. The design, implementation, and use of grammar forma]isms for natural language have constituted a major branch of coml)utational linguistics throughout its development. By viewing grammar formalisms as just a special ease of computer languages, we can take advantage of the machinery of denotational semantics to provide a precise specification of their meaning. Using Dana Scott's domain theory, we elucidate the nature of the feature systems used in augmented phrase-structure grammar formalisms, in particular those of recent versions of generalized phrase structure grammar, lexical functional grammar and PATRI1, and provide a (lcnotational semantics for a simple grammar formalism. We find that the mathematical structures developed for this purpose contain an operation of feature generalization, not available in those grammar formalisms, that can be used to give a partial account of the effect of coordination on syntactic features. The last point deserves amplification. Context-free grammars possess an obvious declarative semantics in which nonterminals represent sets of strings and rules represent n-ary relations over strings. This is brought out by the reinterpretation familiar from formal language theory of context-free grammars as polynomials over concatenation and set union. The grammar formalisms developed from the definite-clause subset of first order logic are the only others used in natural-language analysis that have been accorded a rigorous declarative semantics--in this case derived from the declarative semantics of logic programs [3,12,1 I]. Much confusion, wasted effort, and dissension have resulted from this state of affairs. In the absence of a rigorous semantics for a given grammar formalism, the user, critic, or implementer of the formalism risks misunderstanding the intended interpretation of a construct, and is in a poor position to compare it to alternatives. Likewise, the inventor of a new formalism can never be sure of how it compares with existing ones. As an example of these dillqculties, two simple changes in the implementation of the ATN formalism, the addition of a well-formed substring table and the use of a bottom-up parsing strategy, required a rather subtle and unanticipated reinterpretation of the register-testing and -setting actions, thereby imparting a different meaning to grammars that had been developed for initial top-down backtrack implementation [22]. 1. I n t r o d u c t i o n I The design, implementation, and use of grammar formalisms for natural lang,age have constituted a major branch of computational linguistics throughout its development. Itowever, notwithstanding the obvious superficial similarily between designing a grammar formalism and designing a programming language, the design techniques used for grammar formalisms have almost always fallen short with respect to those now available for programming language design. Rigorous definitions of grammar formalisms can and should be made available. Looking at grammar formalisms as just a special case of computer languages, we can take advantage of the machinery of denotational semantics [20i to provide a precise specification of their meaning. This approach can elucidate the structure of the data objects manipulated by a formalism and the mathematical relationships among various formalisms, suggest new possibilities for linguistic analysis (the subject matter of the formalisms), and establish connections between grammar formalisms and such other fields of research as programming2This use of the term "semantics" should not be confused with the Formal and computational linguists most often explain the effect of a grammar formalism construct either by example or through its actual operation in a particular implementation. Such practices are frowned upon by most programming-language designers; they become even more dubious if one considers that most grammar formalisms in use are based either on a context-free skeleton with augmentations or on some closely related device (such as ATNs), consequently making them obvious candidates for IThe research reported in this paper has been made possible by a gift more common usage denoting that portion of a grammar concerned with the meaning of object sentences. Here we are concerned with the meaning of the metalanguage. from the System Development Foundation. 123 eral may not terminate and may therefore never produce a fully defined output, although each individual step may be adding more and more information to a partial description of the undeliverable output. language design and theories of abstract d a t a types. This last point is particularly interesting because it opens up several possibilities--among them that of imposing a type discipline on the use of a formalism, with all the attendant advantages of compile-time error checking, modularity, and optimized compilation techniques for grammar rules, and that of relating grammar formalisms to other knowledge representation languages [l]. As a specific contribution of this study, we elucidate the nature of the feature systems used in augmented phrasestructure grammar formalisms, in particular those of recent versions of generalized phrase structure grammar (GPSG) [5,15], lexical functional grammar (LFG) [2] and PATR-II [18,17]; we find that the mathematical structures developed for this purpose contain an operation of feature generalization, not available in those grammar formalisms, that can be used to give a partial account of the effect of coordination on syntactic features. Just as studies in the semantics of programming languages start by giving semantics for simple languages, so we will start with simple grammar formalisms that capture the essence of the method without an excess of obscuring detail. The present enterprise should be contrasted with studies of the generative capacity of formalisms using the techniques of formal language theory. First, a precise defini!;ion of the semantics of a formalism is a prerequisite for such generative-capacity studies, and this is precisely what we are trying to provide. Second, generative capacity is a very coarse gauge: in particular, it does not distinguish among different formalisms with the same generative capacity that may, however, have very different semantic accounts. Finally, the tools of formal language theory are inadequate to describe at a sufficiently abstract level formalisms that are based on the simultaneous solution of sets of constraints [9,10]. An abstract analysis of those formalisms requires a notion of partial information that is precisely captured by the constructs of denotationai semantics. Domain theory is a mathematical theory of considerable complexity. Potential nontermination and the use of functions as "first-class citizens" in computer languages account for a substantial fraction of that complexity. If, as is the case in the present work, neither of those two aspects comes into play, one may be justified in asking why such a complex apparatus is used. Indeed, both the semantics of context-free grammars mentioned earlier and the semantics of logic grammars in general can be formulated using elementary set theory [7,21]. However, using the more complex machinery may be beneficial for the following reasons: • Inherent partiality:, many grammar formalisms operate in terms of constraints between elements that do not fully specify all the possible features of an element. • Technical economy, results that require laborious constructions without utilizing domain theory can be reached trivially by using standard results of the theory. • Suggestiveness: domain theory brings with it a rich mathematical structure that suggests useful operations one might add to a grammar formalism. • Eztensibilit~. unlike a domain-theoretic account, a specialized semantic account, say in terms of sets, may not be easily extended as new constructs are added to the formalism. 3. T h e D o m a i n tures of Feature Struc- We will start with an abstract denotational description of a simple feature system which bears a close resemblance to the feature systems of GPSG, LFG and PATR-II, although this similarity, because of its abstractness, may not be apparent at first glance. Such feature systems tend to use data structures or mathematical objects that are more or less isomorphic to directed graphs of one sort or another, or, as they are sometimes described, partial functions. Just what the relation is between these two ways of viewing things will be explained later. In general, these graph structures are used to encode linguistic information in the form of attribute-vahm pairs. Most importantly, partial information is critical to the use of such systems--for instance, in the variables of definite clause grammars [12] and in the GPSG analysis of coordination [15]. That is, the elements of the feature systems, called fealure struclures (alternatively, feature bundles, f-structures [2], or terms} can be partial in some sense. The partial descriptions, being in a domain of attributes and complex values, tend to be equational in nature: some feature's value is equated with some other value. Partial descriptions can be understood 2. D e n o t a t i o n a l S e m a n t i c s In broad terms, denotational semantics is the study of the connection between programs and mathematical entities that represent their input-output relations. For such an account to be useful, it must be compositional, in the sense that the meaning of a program is developed from the meanings of its parts by a fixed set of mathematical operations that correspond directly to the ways in which the parts participate in the whole. For the purposes of the present work, denotational semantics will mean the semantic domain theory initiated by Scott and Strachey [20]. In accordance with this approach, the meanings of programming language constructs are certain partial mappings between objects that represent partially specified data objects or partially defined states of computation. The essential idea is that the meaning of a construct describes what information it adds to a partial description of a d a t a object or of a state of computation. Partial descriptions are used because computations in gen- 124 between elements of Con and elements of D and A is a special least informative element that gives no information at all. We say that a subset S of D is deductively closed if every proposition entailed by a consistent subset of S is in S. The deductive closure -S of S ___ /9 is the smallest deductively closed subset of/9 that contains S. The descriptor equations discussed earlier are the propositions of the information system for feature structure descriptions. Equations express constraints among feature values in a feature structure and the entailment relation encodes the reflexivity, symmetry, transitivity and substitutivity of equality. More precisely, we say that a finite set of equations E entails an equation e if in one of two w:ays: either the descriptions represent sets of fully specilied elements of an underlying domain or they are regarded as participating in a relationship of partiality with respect to each other. We will hold to the latter view here. What are feature structures from this perspective? They are repositories of information about linguistic entities. In domain-theoretic terms, the underlying domain of feature structures F is a recursive domain of partial functions from a set of labels L (features, attribute names, attributes) to complex values or primitive atomic values taken from a set C of constants. Expressed formally, we have the domain equation • Membership: e E E F=IL~F]+G • Reflezivit~. e is A or d = d for some descriptor d The solution of this domain equation can be understood as a set of trees (finite or infinite} with branches labeled by elements of L, and with other trees or constants as nodes. The branches la . . . . , Im from a node n point to the values n { l t ) , . . . , n(Im) for which the node, as a partial function, is defined. 4. The Domain • Symmetry. e is dl = d2 and dz = dl is in E • Transitivity. e is da = dz and there is a descriptor d such that dl = d and d = dz are in E • Substitutivit~r. e is dl = Pl • d2 and both pl = Pz and dl = P2 • d.~ are in E • Iteration: there is E' C E such that E' b e and for all e'E~ EF-e' of Descriptions With this notion of entailment, the most natural definition of the set Con is that a finite subset E of 19 is consistent if and only if it does not entail an inconsistent equation, which has the form e~ = cz, with et and Cz as distinct constants. An arbitrary subset of/9 is consistent if and only if all its finite subsets are consistent in the way defined above. The consistent and deductively closed subsets of D ordered by inclusion form a complete partial order or domain D, our domain of descriptions of feature structures. Deductive closure is used to define the elements of D so that elements defined by equivalent sets of equations are the same. In the rest of this paper, we will specify elements of D by convenient sets of equations, leaving the equations in the closure implicit. The inclusion order K in D provides the notion of a description being more or less specific than another. The least-upper-bound operation 12 combines two descriptions into the least instantiated description that satisfies the equations in both descriptions, their unification. The greatest-lower-bound operation n gives the most instantiated description containing all the equations common to two descriptions, their generalization. The foregoing definition of consistency may seem very natural, but it has the technical disadvantage that, in general, the union of two consistent sets is not itself a consistent set; therefore, the corresponding operation of unification may not be defined on certain pairs of inputs. Although this does not cause problems at this stage, it fails to deal with the fact that failure to unify is not the same as lack of definition and causes technical difficulties when providing rule denotations. We therefore need a slightly less natural definition. First we add another statement to the specification of the entailment relation: What the grammar formalism does is to talk about F, not in F. That is, the grammar formalism uses a domain of descriptions of elements of F. From an intuitive standpoint, this is because, for any given phrase, we may know facts about it that cannot be encoded in the partial function associated with it.. A partial description of an element n of F will be a set of equations that constrain the values of n on certain labels. In general, to describe an element z E F we have equations of the following forms: (... (xII. })-..)ll;.) = (..-(z(li,))...)(l;.) (".(x{li,))".)(li,~) = ck , which we prefer to write as (t~,...I;.) = (Ij,..-i;.) (li,"'li=) = ck with x implicit. The terms of such equations are constants c E C' or paths {ll, " . It=), which we identify in what follows with strings in L*. Taken together, constants and paths comprise the descriptors. Using Scott's information systems approach to domain construction [16], we can now build directly a characterization of feature structures in terms of information-bearing elements, equations, that engender a system complete with notions of compatibility and partiality of information. The information system D describing the elements of F is defined, following Scott, as the tuple D = (/9,A, Con, ~-) , where 19 is a set of propositions, Con is a set of finite subsets of P, the consistent subsets, I- is an entailment relation 125 Falsitv. if e is inconsistent, {e} entails every element of P. - That is, falsity entails anything. Next we define Con to be simply the set of all finite subsets of P. The set C o n n o longer corresponds to sets of equations that are consistent in the usual equational sense. With the new definitions of Con and I-, the deductive closure of a set containing an inconsistent equation is the whole of P. The partial order D is now a lattice with top element T = P, and the unification operation t_l is always defined and returns T on unification failure. We can now define the description mapping 6 : D --* F that relates descriptions to the described feature structures. The idea is that, in proceeding from a description d 6 D to a feature structure f 6 F, we keep only definite information about values and discard information that only states value constraints, but does not specify the values themselves. More precisely, seeing d as a set of equations, we consider only the subset LdJ of d with elements of the form cycles when viewed as a directed graph--then the resulting feature tree will be infinite2 Stated more precisely, an element f of a domain is finite, if for any ascending sequence {d~} such that f E_ U~d~, there is an i such that f C_ d~. Then the cyclic elements of D are those finite elements that are mapped by 6 into nonfinite elements of F. • (l~-..lm)=c~ 5. Providing a Denotation for a Grammar We now move on to the question of how the domain D is used to provide a denotational semantics for a grammar formalism. We take a simple grammar formalism with rules consisting of a context-free part over a nonterminal vocabulary . t / = { N t , . . . , Ark} and a set of equations over paths in ([0..c~]- L * ) 0 C . A sample rule might be S ~ NP VP . . s,,bj) = (I) (o predicate) = (2) (1 agr) = (2 agr) (o Each e 6 [d] defines an element f(e) of F by the equations f(e)(l,) = f, fi-,(li) ---- fl f,._,(l,.) = ek This is a simplification of the rule format used in the PATRII formalism [18,17]. The rule can be read as "an S is an NP followed by a VP, where the subject of the S is the NP, its predicate the VP, and the agreement of the NP the same as the agreement of tile VP'. More formally, a grammar is a quintuple G = (//,S, L, C, R), where • ,t/is a finite, nonempty set of nonterminals N t , . . . , Nk • S is the set of strings over some alphabet (a fiat domain with an ancillary continuous function concatenation, notated with the symbol .). • R is a set of pairs r = (/~0 ~ N,, .. . N , . , E~), where E. is a set of equations between elements of ([0..m] - L ' ) 0 C. As with context-free grammars, local ambiguity of a grammar means that in general there are several ways of assembling the same subphrases into phra.ses. Thus, the semantics of context-free grammars is given in terms of sets of strings. The situation is somewhat more complicated in our sample formalism. The objects specified by the grammar are pairs of a string and a partial description. Because of partiality, the appropriate construction cannot be given in terms of sets of string-description pairs, but rather in terms of the related domain construction of powerdomains [14,19,16]. W e will use the Hoare powerdomain P = PM(S x D) of the domain S x D of string-description pairs. Each element of P is an approximation of a transduetion relation, which is an association between strings and their possible descriptions. We can get a feeling for what the domain P is doing by examinin~ our notion of lexicon. A lexicon will be an SMote precisely a rational tree, that is, a tree with a finite number of , with each of the f~ undefined for all other labels. Then, we can define 6(d) as 6(d) = L] f(e) ~eL~l This description mapping can be shown to be continuous in the sense of domain theory, that is, it has the properties that increasing information in a description leads to nendecreasing information in the described structures {monotonieity) and that if a sequence of descriptions approximates another description, the same condition holds for the described structures. Note that 6 may map several elements of D on to one element of F. For example, the elements given by the two sets of equations (fh) = c (gi) = e describe the same structure, because the description mapping ignores the link between (f h) and (g i) in the first description. Such links are useful only when unifying with further descriptive elements, not in the completed feature structure, which merely provides feature-value assignments. Informally, we can think of elements of D as directed rooted graphs and of elements of F as their unfoldings as trees, the unfolding being given by the mapping 6. It is worth noting that if a description is cyclic---that is, if it has distinct subtrees. 126 rest of the work of applying a rule, extracting the result, is done by the projection and deindcxing steps• The four steps for applying a rule element of the domain pk, associating with each of the k nonterminals N;, I < i < k a transduction relation from the corresponding coordinate of pk. Thus, for each nonterminal, the lexicon tells us what phrases are under that nonterminal and what possible descriptions each such phrase has. llere is a sample lexicon: to string-description pairs (s,,d,} . . . . . (sk,dk} are as follows. First, we index each d,, into d~ by replacing every path p m any of tts equatmns with the path I " P. We then combine these indexed descriptions with the rule by unifying the deductive closure of E, with all the indexed descriptions: {"Uther", } {(agr n,tm) = sg, (agr per) = 3}) ("many knights", {
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.