Báo cáo khoa học: "Beyond Projectivity: Multilingual Evaluation of Constraints and Measures on Non-Projective Structures"

pdf
Số trang Báo cáo khoa học: "Beyond Projectivity: Multilingual Evaluation of Constraints and Measures on Non-Projective Structures" 8 Cỡ tệp Báo cáo khoa học: "Beyond Projectivity: Multilingual Evaluation of Constraints and Measures on Non-Projective Structures" 170 KB Lượt tải Báo cáo khoa học: "Beyond Projectivity: Multilingual Evaluation of Constraints and Measures on Non-Projective Structures" 0 Lượt đọc Báo cáo khoa học: "Beyond Projectivity: Multilingual Evaluation of Constraints and Measures on Non-Projective Structures" 0
Đánh giá Báo cáo khoa học: "Beyond Projectivity: Multilingual Evaluation of Constraints and Measures on Non-Projective Structures"
5 ( 22 lượt)
Nhấn vào bên dưới để tải tài liệu
Để tải xuống xem đầy đủ hãy nhấn vào bên trên
Chủ đề liên quan

Nội dung

Beyond Projectivity: Multilingual Evaluation of Constraints and Measures on Non-Projective Structures Jiřı́ Havelka Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic havelka@ufal.mff.cuni.cz Abstract Dependency analysis of natural language has gained importance for its applicability to NLP tasks. Non-projective structures are common in dependency analysis, therefore we need fine-grained means of describing them, especially for the purposes of machine-learning oriented approaches like parsing. We present an evaluation on twelve languages which explores several constraints and measures on non-projective structures. We pursue an edge-based approach concentrating on properties of individual edges as opposed to properties of whole trees. In our evaluation, we include previously unreported measures taking into account levels of nodes in dependency trees. Our empirical results corroborate theoretical results and show that an edge-based approach using levels of nodes provides an accurate and at the same time expressive means for capturing non-projective structures in natural language. 1 Introduction Dependency analysis of natural language has been gaining an ever increasing interest thanks to its applicability in many tasks of NLP—a recent example is the dependency parsing work of McDonald et al. (2005), which introduces an approach based on the search for maximum spanning trees, capable of handling non-projective structures naturally. The study of dependency structures occurring in natural language can be approached from two sides: 608 by trying to delimit permissible dependency structures through formal constraints (for a recent review paper, see Kuhlmann and Nivre (2006)), or by providing their linguistic description (see e.g. Veselá et al. (2004) and Hajičová et al. (2004) for a linguistic analysis of non-projective constructions in Czech.1 ) We think that it is worth bearing in mind that neither syntactic structures in dependency treebanks, nor structures arising in machine-learning approaches, such as MST dependency parsing, need a priori fall into any formal subclass of dependency trees. We should therefore aim at formal means capable of describing all non-projective structures that are both expressive and fine-grained enough to be useful in statistical approaches, and at the same time suitable for an adequate linguistic description.2 Holan et al. (1998) first defined an infinite hierarchy of classes of dependency trees, going from projective to unrestricted dependency trees, based on the notion of gap degree for subtrees (cf. Section 3). Holan et al. (2000) present linguistic considerations concerning Czech and English with respect to this hierarchy (cf. also Section 6). In this paper, we consider all constraints and measures evaluated by Kuhlmann and Nivre (2006)— with some minor variations, cf. Section 4.2. Ad1 These two papers contain an error concerning an alternative condition of projectivity, which is rectified in Havelka (2005). 2 The importance of such means becomes more evident from the asymptotically negligible proportion of projective trees to all dependency trees; there are super-exponentially many unrestricted trees compared to exponentially many projective trees on n nodes. Unrestricted dependency trees (i.e. labelled rooted trees) and projective dependency trees are counted by sequences A000169 and A006013 (offset 1), respectively, in the On-Line Encyclopedia of Sequences (Sloane, 2007). Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 608–615, Prague, Czech Republic, June 2007. c 2007 Association for Computational Linguistics ditionally, we introduce several measures not considered in their work. We also extend the empirical basis from Czech and Danish to twelve languages, which were made available in the CoNLL-X shared task on dependency parsing. In our evaluation, we do not address the issue of what possible effects the annotations and/or conversions used when creating the data might have on non-projective structures in the different languages. The newly considered measures have the first or both of the following desiderata: they are based on properties of individual non-projective edges (cf. Definition 3); and they take into account levels of nodes in dependency trees explicitly. None of the constraints and measures in Kuhlmann and Nivre (2006) take into account levels of nodes explicitly. Level types of non-projective edges, introduced by Havelka (2005), have both desiderata. They provide an edge-based means of characterizing all nonprojective structures; they also have some further interesting formal properties. We propose a novel, more detailed measure, level signatures of non-projective edges, combining levels of nodes with the partitioning of gaps of nonprojective edges into components. We derive a formal property of these signatures that links them to the constraint of well-nestedness, which is an extension of the result for level types (see also Havelka (2007b)). The paper is organized as follows: Section 2 contains formal preliminaries; in Section 3 we review the constraint of projectivity and define related notions necessary in Section 4, where we define and discuss all evaluated constraints and measures; Section 5 describes our data and experimental setup; empirical results are presented in Section 6. 2 Formal preliminaries Here we provide basic definitions and notation used in subsequent sections. Definition 1 A dependency tree is a triple (V, →, ), where V is a finite set of nodes, → a dependency relation on V , and  a total order on V .3 3 We adopt the following convention: nodes are drawn topdown according to their increasing level, with nodes on the same level being the same distance from the root; nodes are drawn from left to right according to the total order on nodes; edges are drawn as solid lines, paths as dotted curves. 609 Relation → models linguistic dependency, and so represents a directed, rooted tree on V . There are many ways of characterizing rooted trees, we give here a characterization via the properties of →: there is a root r ∈ V such that r →∗ v for all v ∈ V and there is a unique edge p → v for all v ∈ V , v 6= r, and no edge into r. Relation →∗ is the reflexive transitive closure of → and is usually called subordination. For each node i we define its level as the length of the path r →∗ i; we denote it leveli . The symmetrization ↔ = → ∪ →−1 makes it possible to talk about edges (pairs of nodes i, j such that i → j) without explicitly specifying the parent (head; i here) and the child (dependent; j here); so → represents directed edges and ↔ undirected edges. To retain the ability to talkabout the direction of edges,  we define Parenti↔ j = i j if i → j if j → i and Childi↔ j = j i if i → j . if j → i To make the exposition clearer by avoiding overuse of the symbol →, we introduce notation for rooted subtrees not only for nodes, but also for edges: Subtreei = {v ∈ V | i →∗ v}, Subtreei↔ j = {v ∈ V | Parenti↔ j →∗ v} (note that the subtree of an edge is defined relative to its parent node). To be able to talk concisely about the total order on nodes , we define open intervals whose endpoints need not be in a prescribed order (i, j) = {v ∈ V | min {i, j} ≺ v ≺ max {i, j}}. 3 Condition of projectivity Projectivity of a dependency tree can be characterized both through the properties of its subtrees and through the properties of its edges.4 Definition 2 A dependency tree T = (V, →, ) is projective if it satisfies the following equivalent conditions: (Harper & Hays) i → j & v ∈ (i, j) =⇒ v ∈ Subtreei , (Lecerf & Ihm) j ∈ Subtreei & v ∈ (i, j) =⇒ v ∈ Subtreei , (Fitialov) j1 , j2 ∈ Subtreei & v ∈ ( j1 , j2 ) =⇒ v ∈ Subtreei . Otherwise T is non-projective. 4 There are many other equivalent characterizations of projectivity, we give only three historically prominent ones. It was Marcus (1965) who proved the equivalence of the conditions in Definition 2, proposed in the early 1960’s (we denote them by the names of those to whom Marcus attributes their authorship). We see that the antecedents of the projectivity conditions move from edge-focused to subtreefocused (i.e. from talking about dependency to talking about subordination). It is the condition of Fitialov that has been mostly explored when studying so-called relaxations of projectivity. (The condition is usually worded as follows: A dependency tree is projective if the nodes of all its subtrees constitute contiguous intervals in the total order on nodes.) However, we find the condition of Harper & Hays to be the most appealing from the linguistic point of view because it gives prominence to the primary notion of dependency edges over the derived notion of subordination. We therefore use an edge-based approach whenever we find it suitable. To that end, we need the notion of a nonprojective edge and its gap. Definition 3 For any edge i ↔ j in a dependency tree T we define its gap as follows Gapi↔ j = {v ∈ V | v ∈ (i, j) & v ∈ / Subtreei↔ j } . An edge with an empty gap is projective, an edge whose gap is non-empty is non-projective.5 We see that non-projective are those edges i ↔ j for which there is a node v such that together they violate the condition of Harper & Hays; we group all such nodes v into Gapi↔ j , the gap of the nonprojective edge i ↔ j. The notion of gap is defined differently for subtrees of a dependency tree (Holan et al., 1998; Bodirsky et al., 2005). There it is defined through the nodes of the whole dependency tree not in the considered subtree that intervene between its nodes in the total order on nodes . 4 Relaxations of projectivity: evaluated constraints and measures In this section we present all constraints and measures on dependency trees that we evaluate empir5 In figures with sample configurations we adopt this convention: for a non-projective edge, we draw all nodes in its gap explicitly and assume that no node on any path crossing the span of the edge lies in the interval delimited by its endpoints. 610 ically in Section 6. First we give definitions of global constraints on dependency trees, then we present measures of non-projectivity based on properties of individual non-projective edges (some of the edge-based measures have corresponding treebased counterparts, however we do not discuss them in detail). 4.1 Tree constraints We consider the following three global constraints on dependency trees: projectivity, planarity, and well-nestedness. All three constraints can be applied to more general structures, e.g. dependency forests or even general directed graphs. Here we adhere to their primary application to dependency trees. Definition 4 A dependency tree T is non-planar if there are two edges i1 ↔ j1 , i2 ↔ j2 in T such that i1 ∈ (i2 , j2 ) & i2 ∈ (i1 , j1 ) . Otherwise T is planar. Planarity is a relaxation of projectivity that corresponds to the “no crossing edges” constraint. Although it might get confused with projectivity, it is in fact a strictly weaker constraint. Planarity is equivalent to projectivity for dependency trees with their root node at either the left or right fringe of the tree. Planarity is a recent name for a constraint studied under different names already in the 1960’s— we are aware of independent work in the USSR (weakly non-projective trees; see the survey paper by Dikovsky and Modina (2000) for references) and in Czechoslovakia (smooth trees; Nebeský (1979) presents a survey of his results). Definition 5 A dependency tree T is ill-nested if there are two non-projective edges i1 ↔ j1 , i2 ↔ j2 in T such that i1 ∈ Gapi2 ↔ j2 & i2 ∈ Gapi1 ↔ j1 . Otherwise T is well-nested. Well-nestedness was proposed by Bodirsky et al. (2005). The original formulation forbids interleaving of disjoint subtrees in the total order on nodes; we present an equivalent formulation in terms of non-projective edges, derived in (Havelka, 2007b). Figure 1 illustrates the subset hierarchy between classes of dependency trees satisfying the particular constraints: projective ( planar ( well-nested ( unrestricted projective planar well-nested unrestricted Figure 1: Sample dependency trees (trees satisfy corresponding constraints and violate all preceding ones) positive type type 0 negative type 4.2 Edge measures Figure 2: Sample configurations with non-projective edges of different level types The first two measures are based on two ways of partitioning the gap of a non-projective edge—into intervals and into components. The third measure, level type, is based on levels of nodes. We also propose a novel measure combining levels of nodes and the partitioning of gaps into components. of the dependency tree (there may be nodes in the subtree of the component root that lie outside the span of the particular non-projective edge). Definition 6 For any edge i ↔ j in a dependency tree T we define its interval degree as follows idegi↔ j = number of intervals in Gapi↔ j . By an interval we mean a contiguous interval in , i.e. a maximal set of nodes comprising all nodes between its endpoints in the total order on nodes . This measure corresponds to the tree-based gap degree measure in (Kuhlmann and Nivre, 2006), which was first introduced in (Holan et al., 1998)— there it is defined as the maximum over gap degrees of all subtrees of a dependency tree (the gap degree of a subtree is the number of contiguous intervals in the gap of the subtree). The interval degree of an edge is bounded from above by the gap degree of the subtree rooted in its parent node. Definition 7 For any edge i ↔ j in a dependency tree T we define its component degree as follows cdegi↔ j = number of components in Gapi↔ j . By a component we mean a connected component in the relation ↔, in other words a weak component in the relation → (we consider relations induced on the set Gapi↔ j by relations on T ). This measure was introduced by Nivre (2006); Kuhlmann and Nivre (2006) call it edge degree. Again, they define it as the maximum over all edges. Each component of a gap can be represented by a single node, its root in the dependency relation induced on the nodes of the gap (i.e. a node of the component closest to the root of the whole tree). Note that a component need not constitute a full subtree 611 Definition 8 The level type (or just type) of a nonprojective edge i ↔ j in a dependency tree T is defined as follows Typei↔ j = levelChildi↔ j − minn∈Gapi↔ j leveln . The level type of an edge is the relative distance in levels of its child node and a node in its gap closest to the root; there may be more than one node witnessing an edge’s type. For sample configurations see Figure 2. Properties of level types are presented in Havelka (2005; 2007b).6 We propose a new measure combining level types and component degrees. (We do not use interval degrees, i.e. the partitioning of gaps into intervals, because we cannot specify a unique representative of an interval with respect to the tree structure.) Definition 9 The level signature (or just signature) of an edge i ↔ j in a dependency tree T is a mapping Signaturei↔ j : P (V ) → Z N0 defined as follows Signaturei↔ j = {levelChildi↔ j − levelr | r is component root in Gapi↔ j } . (The right-hand side is considered as a multiset, i.e. elements may repeat.) We call the elements of a signature component levels. The signature of an edge is a multiset consisting of the relative distances in levels of all component roots in its gap from its child node. Further, we disregard any possible orderings on signatures and concentrate only on the relative distances in levels. We present signatures as non6 For example, presence of non-projective edges of nonnegative level type in equivalent to non-projectivity of a dependency tree; moreover, all such edges can be found in linear time. decreasing sequences and write them in angle brackets h i, component levels separated by commas (by doing so, we avoid combinatorial explosion). Notice that level signatures subsume level types: the level type of a non-projective edge is the component level of any of possibly several component roots closest to the root of the whole tree. In other words, the level type of an edge is equal to the largest component level occurring in its level signature. Level signatures share interesting formal properties with level types of non-projective edges. The following result is a direct extension of the results presented in Havelka (2005; 2007b). Theorem 10 Let i ↔ j be a non-projective edge in a dependency tree T . For any component c in Gapi↔ j represented by root rc with component level lc ≤ 0 (< 0) there is a non-projective edge v → rc in T with Typev↔rc ≥ 0 (> 0) such that either i ∈ Gapv↔rc , or j ∈ Gapv↔rc . From the assumptions lc ≤ 0 and rc ∈ Gapi↔ j the parent v of node rc lies outside the span of the edge i ↔ j, hence v ∈ / Gapi↔ j . Thus either i ∈ (v, rc ), or j ∈ (v, rc ). Since levelv ≥ levelParenti↔ j , we have that Parenti↔ j ∈ / Subtreev , and so either i ∈ Gapv↔rc , or j ∈ Gapv↔rc . Finally from lc = levelChildi↔ j − levelrc ≤ 0 (< 0) we get levelrc − levelChildi↔ j ≥ 0 (> 0), hence Typev↔rc ≥ 0 (> 0). P ROOF. This result links level signatures to wellnestedness: it tells us that whenever an edge’s signature contains a nonpositive component level, the whole dependency tree is ill-nested (because then there are two edges satisfying Definition 5). All discussed edge measures take integer values: interval and component degrees take only nonnegative values, level types and level signatures take integer values (in all cases, their absolute values are bounded by the size of the whole dependency tree). Both interval and component degrees are defined also for projective edges (for which they take value 0), level type is undefined for projective edges, however the level signature of projective edges is defined—it is the empty multiset/sequence. 5 Data and experimental setup Figure 3: Sample non-projective tree considered planar in empirical evaluation task on dependency parsing (Buchholz and Marsi, 2006). In alphabetical order they are: Arabic, Bulgarian, Czech, Danish, Dutch, German, Japanese, Portuguese, Slovene, Spanish, Swedish, and Turkish (Hajič et al., 2004; Simov et al., 2005; Böhmová et al., 2003; Kromann, 2003; van der Beek et al., 2002; Brants et al., 2002; Kawata and Bartels, 2000; Afonso et al., 2002; Džeroski et al., 2006; Civit Torruella and Martı́ Antonı́n, 2002; Nilsson et al., 2005; Oflazer et al., 2003).7 We do not include Chinese, which is also available in this data format, because all trees in this data set are projective. We take the data “as is”, although we are aware that structures occurring in different languages depend on the annotations and/or conversions used (some languages were not originally annotated with dependency syntax, but only converted to a unified dependency format from other representations). The CoNLL data format is a simple tabular format for capturing dependency analyses of natural language sentences. For each sentence, it uses a technical root node to which dependency analyses of parts of the sentence (possibly several) are attached. Equivalently, the representation of a sentence can be viewed as a forest consisting of dependency trees. By conjoining partial dependency analyses under one technical root node, we let all their edges interact. Since the technical root comes before the sentence itself, no new non-projective edges are introduced. However, edges from technical roots may introduce non-planarity. Therefore, in our empirical evaluation we disregard all such edges when counting trees conforming to the planarity constraint; we also exclude them from the total numbers of edges. Figure 3 exemplifies how this may affect counts of non-planar trees;8 cf. also the remark after Definition 4. Counts of well-nested trees are not affected. 7 All data sets are the train parts of the CoNLL-X shared task. 8 The sample tree is non-planar according to Definition 4, We evaluate all constraints and measures described however we do not consider it as such, because all pairs of in the previous section on 12 languages, whose tree- “crossing edges” involve an edge from the technical root (edges banks were made available in the CoNLL-X shared from the technical root are depicted as dotted lines). 612 6 Empirical results Our complete results for global constraints on dependency trees are given in Table 1. They confirm the findings of Kuhlmann and Nivre (2006): planarity seems to be almost as restrictive as projectivity; well-nestedness, on the other hand, covers large proportions of trees in all languages. In contrast to global constraints, properties of individual non-projective edges allow us to pinpoint the causes of non-projectivity. Therefore they provide a means for a much more fine-grained classification of non-projective structures occurring in natural language. Table 2 presents highlights of our analysis of edge measures. Both interval and component degrees take generally low values. On the other hand, Holan et al. (1998; 2000) show that at least for Czech neither of these two measures can in principle be bounded. Taking levels of nodes into account seems to bring both better accuracy and expressivity. Since level signatures subsume level types as their last components, we only provide counts of edges of positive, nonpositive, and negative level types. For lack of space, we do not present full distributions of level types nor of level signatures. Positive level types give an even better fit with real linguistic data than the global constraint of wellnestedness (an ill-nested tree need not contain a nonprojective edge of nonpositive level type; cf. Theorem 10). For example, in German less than one tenth of ill-nested trees contain an edge of nonpositive level type. Minimum negative level types for Czech, Slovene, Swedish, and Turkish are respectively −1, −5, −2, and −4. Level signatures combine level types and component degrees, and so give an even more detailed picture of the gaps of non-projective edges. In some languages the actually occurring signatures are quite limited, in others there is a large variation. Because we consider it linguistically relevant, we also count how many non-projective edges contain in their gaps a component rooted in an ancestor of the edge (an ancestor of an edge is any node on the path from the root of the whole tree to the parent node of the edge). The proportions of such nonprojective edges vary widely among languages and for some this property seems highly important. 613 Empirical evidence shows that edge measures of non-projectivity taking into account levels of nodes fit very well with linguistic data. This supports our theoretical results and confirms that properties of non-projective edges provide a more accurate as well as expressive means for describing nonprojective structures in natural language than the constraints and measures considered by Kuhlmann and Nivre (2006). 7 Conclusion In this paper, we evaluate several constraints and measures on non-projective dependency structures. We pursue an edge-based approach giving prominence to properties of individual edges. At the same time, we consider levels of nodes in dependency trees. We find an edge-based approach also more appealing linguistically than traditional approaches based on properties of whole dependency trees or their subtrees. Furthermore, edge-based properties allow machine-learning techniques to model global phenomena locally, resulting in less sparse models. We propose a new edge measure of nonprojectivity, level signatures of non-projective edges. We prove that, analogously to level types, they relate to the constraint of well-nestedness. Our empirical results on twelve languages can be summarized as follows: Among the global constraints, well-nestedness fits best with linguistic data. Among edge measures, the previously unreported measures taking into account levels of nodes stand out. They provide both the best fit with linguistic data of all constraints and measures we have considered, as well as a substantially more detailed capability of describing non-projective structures. The interested reader can find a more in-depth and broader-coverage discussion of properties of dependency trees and their application to natural language syntax in (Havelka, 2007a). As future work, we plan to investigate more languages and carry out linguistic analyses of nonprojective structures in some of them. We will also apply our results to statistical approaches to NLP tasks, such as dependency parsing. Acknowledgement The research reported in this paper was supported by Project No. 1ET201120505 of the Ministry of Education of the Czech Republic. 614 12823 5.38% 677 690 Bulgarian 72703 23.15% Czech 79 13783 16831 5190 15.63% Danish 6 787 811 13349 36.44% Dutch 15 4115 4865 39216 27.75% German 416 10865 10883 17044 5.29% 1 902 Japanese 9071 18.94% Portuguese 7 1713 1718 1534 22.16% Slovene 3 283 340 3306 1.72% 56 57 Spanish 11042 9.77% Swedish 71 1076 1079 4997 11.6% Turkish 14 556 580 200 10 1 211 723 1 1 725 Arabic Bulgarian 211 724 1 0.42% 50097 0.41% 177394 1105437 2.13% 89171 1.06% 179063 5.9% 660394 2.4% Czech Danish Dutch German 23376 940 10209 14605 189 5 349 1198 3 8 37 23190 842 10264 13107 292 78 238 2206 66 22 47 434 23495 942 10564 15803 75 3 2 41 4 h2i / 18507 h2i / 555 h2i / 8061 h2i / 8407 h1i / 2886 h1i / 115 h3i / 1461 h1i / 3112 h3i / 1515 h3i / 100 h1i / 512 h1, 1i / 1503 h4i / 154 h1, 1i / 63 h4i / 201 h3i / 1397 h1, 1i / 115 h4i / 41 h1, 1i / 118 h2, 2i / 476 h0i / 70 h5i / 16 h2, 2i / 52 h1, 1, 1i / 312 h2, 2i / 58 h1, 1, 1i / 16 h1, 1, 1i / 25 h4i / 136 h1, 1, 1i / 48 h2, 2i / 7 h5i / 23 h3, 3i / 98 h2, 4i / 44 h6i / 6 h1, 3i / 16 h2, 2, 2i / 69 h1, 3i / 32. h2, 2, 2i / 6. h3, 3i / 15. h1, 1, 1, 1i / 59. .. .. .. .. 20035 703 9781 10128 19913 685 9697 9526 23570 945 10566 15844 Portuguese 2398 272 24 2466 151 64 2699 3 126511 1.32% 197607 1.37% h1i / 466 h2i / 1670 h2i / 209 h1i / 571 h4i / 186 h3i / 208 h3i / 183 h1, 1i / 113 h5i / 126 h1, 1, 1i / 44 h6i / 113 h2, 2i / 29 h7i / 78 h2, 2, 2i / 13 h1, 1i / 63 h4i / 12 h8i / 49 h1, 1, 1, 1i / 7 h9i / 35. h1, 1, 1, 1, 1i / 6. .. .. 0 1832 0 1820 1667 2702 Japanese 1570 81 12 1484 143 26 1667 25777 2.13% 86028 0.07% 531 59 11 2 547 59 3 2 h2i / 384 h2i / 46 h1i / 67 h3i / 7 h3i / 45 h4i / 4 h4i / 13 h1i / 2 h5i / 12 h1, 1i / 6 h6i / 4 h1, 1, 1, 1i / 4 h7i / 2 h1, 1, 3i / 2. .. 392 57 386 57 550 59 Slovene Spanish 548 58 2 1 180425 1.05% 52273 1.61% Swedish Turkish 1829 813 46 27 9 1 1546 623 204 146 76 55 1847 833 50 8 15 2 h2i / 823 h2i / 341 h1i / 530 h1i / 189 h3i / 114 h1, 1i / 91 h1, 1i / 94 h3i / 53 h0i / 31 h2, 2i / 31 h1, 3i / 27 h1, 1, 1i / 29 h1, 1, 1i / 25 h4i / 19 h4i / 21 h2, 2, 2i / 10 h1, 2i / 19 h3, 3i / 6 h2, 2i / 16. h2, 2, 2, 2i / 6. .. .. 950 345 857 340 1897 841 Table 2: Counts for edge measures interval degree, component degree (for values from 1 to 3; larger values are not included), level type (for positive, nonpositive, and negative values), level signature (up to 10 most frequent values), and numbers of edges with ancestor component roots in their gaps and solely with ancestor component roots in their gaps; the second to last line gives the total numbers of non-projective edges, the last line gives the total numbers of all edges—we exclude edges from technical roots. (The listings need not be exhaustive; an empty cell means count zero.) all proportion of all (%) h1i / 92 h2i / 674 h2i / 56 h3i / 32 h3i / 18 h1i / 10 h4i / 10 h4i / 5 h1, 1i / 8 h5i / 2 h5i / 7 h1, 1, 1i / 1 h6i / 6 h1, 1i / 1 h7i / 4 h2, 2i / 2 h9i / 1. .. ancestor comp. root 39 711 only ancestor comp. r. 39 711 non-projective 211 725 ideg = 1 ideg = 2 ideg = 3 cdeg = 1 cdeg = 2 cdeg = 3 Type > 0 Type ≤ 0 Type < 0 Signature / count Language Table 1: Counts of dependency trees violating global constraints of well-nestedness, planarity, and projectivity; the last line gives the total numbers of dependency trees. (An empty cell means count zero.) 1460 11.16% proportion of all (%) all Arabic 1 150 163 Language ill-nested non-planar non-projective References A. Abeillé, editor. 2003. Treebanks: Building and Using Parsed Corpora, volume 20 of Text, Speech and Language Technology. Kluwer Academic Publishers, Dordrecht. S. Afonso, E. Bick, R. Haber, and D. Santos. 2002. “Floresta sintá(c)tica”: a treebank for Portuguese. In Proceedings of the 3rd Intern. Conf. on Language Resources and Evaluation (LREC), pages 1698–1703. Manuel Bodirsky, Marco Kuhlmann, and Matthias Möhl. 2005. Well-nested drawings as models of syntactic structure. In Proceedings of Tenth Conference on Formal Grammar and Ninth Meering on Mathematics of Language. A. Böhmová, J. Hajič, E. Hajičová, and B. Hladká. 2003. The PDT: a 3-level annotation scenario. In Abeillé (2003), chapter 7. S. Brants, S. Dipper, S. Hansen, W. Lezius, and G. Smith. 2002. The TIGER treebank. In Proceedings of the 1st Workshop on Treebanks and Linguistic Theories (TLT). S. Buchholz and E. Marsi. 2006. CoNLL-X shared task on multilingual dependency parsing. In Proceedings of CoNLLX. SIGNLL. M. Civit Torruella and Ma A. Martı́ Antonı́n. 2002. Design principles for a Spanish treebank. In Proceedings of the 1st Workshop on Treebanks and Linguistic Theories (TLT). Alexander Dikovsky and Larissa Modina. 2000. Dependencies on the other side of the Curtain. Traitement Automatique des Langues (TAL), 41(1):67–96. S. Džeroski, T. Erjavec, N. Ledinek, P. Pajas, Z. Žabokrtsky, and A. Žele. 2006. Towards a Slovene dependency treebank. In Proceedings of the 5th Intern. Conf. on Language Resources and Evaluation (LREC). J. Hajič, O. Smrž, P. Zemánek, J. Šnaidauf, and E. Beška. 2004. Prague Arabic dependency treebank: Development in data and tools. In Proceedings of the NEMLAR Intern. Conf. on Arabic Language Resources and Tools, pages 110–117. Eva Hajičová, Jiřı́ Havelka, Petr Sgall, Kateřina Veselá, and Daniel Zeman. 2004. Issues of Projectivity in the Prague Dependency Treebank. Prague Bulletin of Mathematical Linguistics, 81:5–22. Jiřı́ Havelka. 2005. Projectivity in Totally Ordered Rooted Trees: An Alternative Definition of Projectivity and Optimal Algorithms for Detecting Non-Projective Edges and Projectivizing Totally Ordered Rooted Trees. Prague Bulletin of Mathematical Linguistics, 84:13–30. Jiřı́ Havelka. 2007a. Mathematical Properties of Dependency Trees and their Application to Natural Language Syntax. Ph.D. thesis, Institute of Formal and Applied Linguistics, Charles University in Prague, Czech Republic. Jiřı́ Havelka. 2007b. Relationship between Non-Projective Edges, Their Level Types, and Well-Nestedness. In Proceedings of HLT/NAACL; Companion Volume, Short Papers, pages 61–64. 615 Tomáš Holan, Vladislav Kuboň, Karel Oliva, and Martin Plátek. 1998. Two Useful Measures of Word Order Complexity. In Alain Polguère and Sylvain Kahane, editors, Proceedings of Dependency-Based Grammars Workshop, COLING/ACL, pages 21–28. Tomáš Holan, Vladislav Kuboň, Karel Oliva, and Martin Plátek. 2000. On Complexity of Word Order. Traitement Automatique des Langues (TAL), 41(1):273–300. Y. Kawata and J. Bartels. 2000. Stylebook for the Japanese treebank in VERBMOBIL. Verbmobil-Report 240, Seminar für Sprachwissenschaft, Universität Tübingen. M. T. Kromann. 2003. The Danish dependency treebank and the underlying linguistic theory. In Proceedings of the 2nd Workshop on Treebanks and Linguistic Theories (TLT). Marco Kuhlmann and Joakim Nivre. 2006. Mildly NonProjective Dependency Structures. In Proceedings of COLING/ACL, pages 507–514. Solomon Marcus. 1965. Sur la notion de projectivité [On the notion of projectivity]. Zeitschrift für Mathematische Logik und Grundlagen der Mathematik, 11:181–192. Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Hajič. 2005. Non-Projective Dependency Parsing using Spanning Tree Algorithms. In Proceedings of HLT/EMNLP, pages 523–530. Ladislav Nebeský. 1979. Graph theory and linguistics (chapter 12). In R. J. Wilson and L. W. Beineke, editors, Applications of Graph Theory, pages 357–380. Academic Press. J. Nilsson, J. Hall, and J. Nivre. 2005. MAMBA meets TIGER: Reconstructing a Swedish treebank from antiquity. In Proceedings of the NODALIDA Special Session on Treebanks. Joakim Nivre. 2006. Constraints on Non-Projective Dependency Parsing. In Proceedings of EACL, pages 73–80. K. Oflazer, B. Say, D. Zeynep Hakkani-Tür, and G. Tür. 2003. Building a Turkish treebank. In Abeillé (2003), chapter 15. K. Simov, P. Osenova, A. Simov, and M. Kouylekov. 2005. Design and implementation of the Bulgarian HPSG-based treebank. In Journal of Research on Language and Computation – Special Issue, pages 495–522. Kluwer Academic Publishers. Neil J. A. Sloane. 2007. On-Line Encyclopedia of Integer Sequences. Published electronically at www.research.att.com/˜njas/sequences/. L. van der Beek, G. Bouma, R. Malouf, and G. van Noord. 2002. The Alpino dependency treebank. In Computational Linguistics in the Netherlands (CLIN). Kateřina Veselá, Jiřı́ Havelka, and Eva Hajičová. 2004. Condition of Projectivity in the Underlying Dependency Structures. In Proceedings of COLING, pages 289–295.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.