Scholarly article on topic 'Automatic Extraction of Hypernym & Meronym Relations in English Sentences Using Dependency Parser'

Automatic Extraction of Hypernym & Meronym Relations in English Sentences Using Dependency Parser Academic research paper on "Computer and information sciences"

CC BY-NC-ND
0
0
Share paper
Academic journal
Procedia Computer Science
OECD Field of science
Keywords
{"Relation extraction" / Hypernym / Meronym / "Semantic relation" / "Lexico-syntatic pattern" / Wordnet / "Dependency parser"}

Abstract of research paper on Computer and information sciences, author of scientific article — N. Sheena, Smitha M. Jasmine, Shelbi Joseph

Abstract Relation Extraction is the process of finding relationship between named entities in a text. Automatic extraction of semantic relation between pair of nouns is an important task with many potential applications like information retrieval, information extraction, text summarization, machine translation, question answering, thesaurus construction and word sense disambiguation. This paper describes an approach that extracts hypernym and meronym relation between proper nouns in sentences of a given text. Here machine learning techniques are used to automatically extract proper nouns with a given relationship from a text corpus. This work is based on dependency parsing with ADTree and Naive Bayes classifier. The approach is based on the analysis of the paths between noun pairs in the dependency parse trees of the sentences.

Academic research paper on topic "Automatic Extraction of Hypernym & Meronym Relations in English Sentences Using Dependency Parser"

CrossMark

Available online at www.sciencedirect.com

ScienceDirect

Procedía Computer Science 93 (2016) 539 - 546

6th International Conference On Advances In Computing & Communications, ICACC 2016, 6-8

September 2016, Cochin, India

Automatic Extraction of Hypernym & Meronym Relations in English Sentences Using Dependency Parser

Sheena №*, Smitha M Jasminea, Shelbi Josephb

aDept. of Computer Science & Engineering, College of Engineering & Management Punnapra, Alappuzha - 688003, India bDivision of Information Technology, School of Engineering, CUSAT - 682022, India

Abstract

Relation Extraction is the process of finding relationship between named entities in a text. Automatic extraction of semantic relation between pair of nouns is an important task with many potential applications like information retrieval, information extraction, text summarization, machine translation, question answering, thesaurus construction and word sense disambiguation. This paper describes an approach that extracts hypernym and meronym relation between proper nouns in sentences of a given text. Here machine learning techniques are used to automatically extract proper nouns with a given relationship from a text corpus. This work is based on dependency parsing with ADTree and Naive Bayes classifier. The approach is based on the analysis of the paths between noun pairs in the dependency parse trees of the sentences.

© 2016 The Authors. Published by ElsevierB.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.Org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of the Organizing Committee of ICACC 2016

Keywords: Relation extraction;Hypernym;Meronym;Semantic relation;Lexico-syntatic pattern;Wordnet;Dependency parser

1. Introduction

Extracting relationships between entities from text is one of the most challenging issues in information extraction. Many natural language processing applications depend on ontologies such as WordNet in order to obtain prior knowledge about the semantic relationships between words. Unfortunately, the domain of WordNet is limited in scope, and is time consuming and expensive to maintain and extent. Furthermore, WordNet has no concept of probability. For a given word, WordNet stores a list of its relations to other words, but does not store the probability of the occurrence of that relationship in normal usage. Recently, substantial interest has been directed towards the idea of automatic detection of semantic relations between words. Automatic extraction of semantic relations between nouns from text corpora is important to many Natural Language Processing tasks. For example one would like to search for all terms that bear semantic relation to some other terms. The important semantic relations to extract are the ISA relation, otherwise known as Hyponym-Hypernym relation and Meronym or Part-Whole relation. In this paper we propose a method to improve the automatic extraction of hypernym/hyponym and meronym relationship between proper nouns

* Corresponding author. Tel.: +0-938-721-8529 ; fax: +0-000-000-0000. E-mail address: sheena.cemp@gmail.com

1877-0509 © 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.Org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of the Organizing Committee of ICACC 2016 doi:10.1016/j.procs.2016.07.269

which do not exists in the Wordnet.

The organisation of this paper is as follows. Section 2 gives a brief description of the literature survey. Identification of dependency patterns from corpus using dependency relation features and the generalisation and extension of dependency paths are described in section 3. Section 4 explain the steps involved in testing phase of hypernym and meronym relation extraction. Experimental setup is described in section 5. The summary of this work is illustrated in section 6.

2. Related work

There have been two approaches to extract hypernym relations from texts; Pattern based approach and co-occurrence based approach. Pattern based approach is again classified into lexico-syntactic pattern method and dependency pattern method. Pattern based approach was first used in12,11 to extract hyponym relations from a row corpus. For the discovery of new pattern, gather the list of terms for which a particular relation R is held. From the corpus find the place where these patterns occur and find the commonalities among the environment. This technique is applied to Meronymy relation, but the pattern for this relation does not uniquely identify it. The paper1 used pattern-based techniques and other heuristics to extract meronymy (part-whole) relations. The work in10 improved upon the work of1 using a machine learning filter. In paper7 part of speech patterns are used to extract a subset of hyponym relations involving proper nouns. Recall and Precision are the problems associated with using patterns to extract semantic information from text. These problems are overcome in paper15 by using syntactic constructions like appositions (APOS) in addition to CN PN.

In19 dependency patterns are used as the feature for the discovery of hypernyms from a text. The lexico-syntactic pattern can be generalized using dependency path. A semantic taxonomy is constructed in paper20 by using the same method specified in19. Here probability of noun being a co-ordinate term is used to enhance the probability of there being a hypernym relation. In2 an unlabelled hierarchy of noun was built using bottom up clustering method. Here multiple senses of the word are not considered but noises are filtered. The syntactical co-occurrence approach of19 used a minimal edit distance algorithm to automatically discover the patterns of an ISA relation and learn lexico POS patterns in an automatic way. Here the algorithm is designed for terascale, so recall is valued over precision. But it is impossible to know the number of ISA relations in any nontrivial corpus. Semantic relations between noun compounds with machine learning approach was discussed in paper18. Automatic aqcuisition and expansion of hypernym links of one word to multiword through lexico-syntatic pattern from large corpus are discussed in paper16. Precision and recall of Hearst patterns are increased in paper17,3 through HMM and latent semantic analysis respectively. Paper9 automatically build semantic hierarchy with ISA relations based on word embeddings.

3. Identification of dependency patterns from corpus

The first phase of the relation extraction system is to identify the dependency patterns for hypernym and meronym

relations. The system architecture is shown in figure 1. In the first phase the corpus is preprocessed to extract noun

pairs from each of the sentences. The preprocessing steps are sentence segmentation, POS tagging, tokenization,

stemming and noun extraction and pairing. Nouns pairs are build for the purpose of finding whether there is any

semantic relation in between them. If a sentence contains n different nouns, there are nP2 different pairs of nouns.

Only some of the noun pairs extracted from the Corpus contain the hypernym or meronym relationship. For each

noun, extract from WordNet6, the meronym and hypernym for that noun. This results in a large set of pairs of related

words, labeled according to the type of their relationships.

3.1. Dependency relation features

In paper19,20 dependency parse trees are derived from Minipar parser14 which is a shallow parser. But here we

have applied Stanford parser4 which is Java implementations of probabilistic natural language parsers, both highly

optimized PCFG and dependency parsers, and a lexicalized PCFG parser to directly build the dependency tree. De-

pendency parser shows the dependency relationship5 between words in a sentence. The dependency parse for a given

sentence is essentially a set of triplets each of which is composed of a grammatical relation and the pair of words from

Fig. 1. System Architecture - Phasel.

the sentence among which the grammatical relation holds reli(wj, wk), where reli is the dependency relation among words wj and wk.

Table 1. Dependency tree of the sentence "Heavy water rich in the doubly heavy hydrogen atom called deuterium."

[amod(water-2, Heavy-1), dep(rich-3, water-2), prep(rich-3,in-4), det(atom-9, the-5), advmod(heavy-7, doubly-6), amod(atom-9, heavy-7), nn(atom-9, hydrogen-8), pobj(in-4, atom-9),partmod(atom-9, called-10), dobj(called-10, deuterium-11)]

doubly tne Hydrogen

deuterium

Fig. 2. The Dependency tree of the sentence "Heavy water rich in the doubly heavy hydrogen atom called deuterium"

Table 1 shows the dependency parse of a sentence "Heavy water rich in the doubly heavy hydrogen atom called deuterium" and figure 2 shows the corresponding dependency tree. This example sentence illustrates that the depen-

dency path between a noun pair captures the relevant information regarding the relationship between the nouns better compared to using the words in the unparsed sentence. Consider the noun pair "atom" and "deuterium". The word in the sentence between these nouns are "called ". So the word "called" is a pattern for the hypernym extraction. From the resulting dependency relationships between words, the dependency features are formed. For this first find the shortest dependency paths of five links or less between the pair of nouns which exhibits hypernym relation. If a noun pair is directly dependent on each other then return that dependency directly. Otherwise, we perform a breadth first search to find the shortest path between the two words.

Consider the dependency tree in figure 2. The triplets in the path from "atom" to "deuterium" is "partmod(atom, called), dobj(called,deuterium)". Generalize the dependency path by removing the nouns. Suppose we have a sentence with similar semantics, where the nouns are replaced by another noun ie, "atom" is replaced by "metal" and "deuterium" is replaced by "silver". If we use the words on the path to represent the path feature, we end up with two different paths for the two sentences that have similar semantics. Therefore, in this study we use only the dependency relation types among the words to represent the paths. For example, the triplets path feature extracted for the (atom, deuterium) pair is "partmod(noun1,called), dobj(called, noun2)" and the path feature extracted for the (atom, hydrogen) pair is "nn(Noun1,Noun2)". The words in the dependency path between this noun pair give sufficient information to identify their relationship.

In this sentence we have four nouns (water, hydrogen, atom, and deuterium) and hence twelve pairs of nouns. From these noun pairs identify noun pairs that exhibits hypernym relation. Here only one pair of noun is in hypernym relation. In this example there is a single path between noun pair. However, there may be more than one path between a noun pair, if one or both appear multiple times in the sentence. In such cases, we select the shortest paths between the noun pairs. Patterns for the extraction of Hypernym and Meronym relation in this system are described table 2 and table 3.

Table 2. Hypernym patterns

NP0 such as NP1{, NP2...., (and\or)NPi}i >= 1

Such NP as {NP,} * {(or\and)}NP

NP{, NP}*{,} or other NP

NP{, NP} *{,} and other NP

NP{, }including{NP,} * {or\and}NP

NP{, }especially{NP, }* {or\and}NP

NPy like NPx

NPy called NPx

NPx is a NPy

NPx a kind of NPy

Table 3. Meronym patterns

Meronym Patterns

NPx PPy

—PPy starts with of, inside NPx PPy

—PPx starts with above

NPy's NPx

NPy verb NPx

—verb have

NPx verb NPz PPy

—PPy starts with of

—NPz is a part or is a member

Example

door of the car, Walls inside the building

they ambute his leg above the knee. buiding's basement

car has an engine

finger is a part of hand, Iceland is a member of NATO

3.2. Generalization and extension of dependency path

In addition to the shortest triplet path between the noun pair, satellite link for each noun in the noun pair is also considered. Satellite link means the link between one noun in the noun pair and any other node in the dependency tree which is not in the shortest triplet path. This is done because there may be some edges which are not strictly part of the dependency path but are vital to provide reliable evidence of hypernymy. These satellite links made it possible for dependency paths to express the lexical patterns in1 originally proposed as good candidates for extracting evidence of hypernymy. The dependency paths with satellite links extracted from the sentence "The library has a large collection of classic books by such authors as Herrick and Shakespeare" whose dependency tree is shown in table 4, are shown in table 5

Table 4. Dependency tree of the sentence "The library has a large collection of classic books by such authors as Herrick and Shakespeare."

[det(library-2, The-1), nsubj(has-3, library-2), det(collection-6, a-4), amod(collection-6, large-5), dobj(has-3, collection-6), prep(collection-6, of-7), amod(books-9, classic-8), pobj(of-7, books-9), prep(has-3, by-10), amod(authors-12, such-11), pobj(by-10, authors-12), prep(authors-12, as-13), pobj(as-13, Herrick-14), cc(Herrick-14, and-15), conj(Herrick-14, Shakespeare.-16)]

Certain function words like "such" in "such NP as NP", "and" in "NP and other NP" and "or" in "NP or other NP" are very important parts of the lexicosyntactic pattern, but they are not included in the shortest triplet path. In the above example satellite link for the noun "atom" is "det(atom, the), amod(atom, heavy), nn(atom, hydrogen), pobj(in, atom)". In this example these satellite links are not important. So find the satellite links that occur five or more times in the corpus.

Table 5. Dependency paths with optional satellite links for the dependency tree shown in table 4, for the noun pair (author,Herrick)

Dependency Paths with Optional Satellite Links for (author,Herrick)

prep(authors, as) pobj(as, Herrick) ,amod(authors, such), prep(authors, as) pobj(as, Herrick) ,pobj(by, authors), prep(authors, as) pobj(as, Herrick) ,cc(Herrick, and), prep(authors, as) pobj(as, Herrick) ,conj(Herrick, Shakespeare)

Generalize the distributive nature of the nouns linked by syntactic conjunction. All words in conjunction use the same dependency path to a noun not in the conjunction For eg:- "Metals Like Silver and Gold are used for making ornaments". Here the shortest triplet path between the noun pair (metal, gold) and satellite link for the nouns "Metal" and "gold" are same as that of the noun pair(Metal, Silver)because the nouns silver and gold are linked by syntactic conjunction "and". Additional pattern obtained for the hypernym is {NP} * {and\or}NP kind of NP. After obtaining all features for a particular noun pair, it is represented as feature vector. Feature vector is a set containing each feature and its frequency. For each noun pair, we construct a feature vector.

FV(np) = (f1, c1), (f2, c2),......(fn, cn) (1)

where fi is the features , ci is the frequency of that feature and n is the total number of features. Once we have a feature vector, each ( fi, ci) pair in the feature vector must be converted to a normalized form using the equation.

F =log a, S., f <2)

where n is the total number of features. Thus equation (3) becomes a normalized Feature vector.

NFV = {F1, F 2,.....Fn} (3)

By averaging the Fi values we obtain the feature value for the feature vector.

3.3. Classification Approaches

Machine learning approaches are used to classify each sentence that contain a hypernym relation, meronym relation or unrelated for a noun pair. We extracted positive and negative training instances (hypernym/unrelated pairs and meronym/unrelated pair) from the training data for each class of events. We considered only the pairs that appear in the same sentence. A sentence exhibits a hypernym relation or meronym relation for one noun pair, while not for another noun pair. For instance, our example sentence is a positive sentence for the (atom, deuterium) noun pair. However, it is a negative sentence for the (atom, hydrogen) noun pair, i.e., it does not describe a hypernym relation between this pair of nouns. Dependency patterns with satellite link are used as the feature for the training process. We built feature vector for each noun pair encountered in the same sentence in our training data and trained two classifiers ADTree8 and Naive Bayes13 on this data. Training statistics for Hypernym relation is shown in figure 3.

— classifier model (full mining set) — Alternating decision tree: : 0.045

i (l)Patfi < -0.012: -1.503 i | (S)PatH < -0.016: 0.098

(6) Path < -0.019: -0.157 (i)Path >- -0,019: 0.625 I (5)Path >- -0,016: -0.&41 I (l)Path >- -0.012; 3.103 (2 Patii < -0.151: -2.426 I (2)Path >. -0.151: 1.036 I HlPJth <-0.1U: l.!26 (3)Path >• -0.111: -0.304 (J)Path < -0.022: -1.579 I ! ¡J)path >- -0.022: 0.481 Legend: -ve « Hyperny». +ve ■ unrelated Tree size (total maoer of nodes): 19 Leaves (ruiber of predictor nodes): 13

Fig. 3. Training statistics for hypernym relation.

Classification algorithms used for hypernym and meronym relation are ADTree algorithm and Naive Bayes Classifier respectively. For the evaluation the number of boosting iterations used for ADTree is 10 and also 3 nodes are expanded.

We collected 312 noun pairs for our training set and tagged them for hypernymy and meronymy separately and automatically using Wordnet2.1. There were 104 and 80 positive examples for hypernym and meronym respectively. 141 dependency paths are collected and to use as features.

4. Automatic Extraction of Hypernym & Meronym related proper nouns

Phase 2 of the system architecture shown in figure 4 describes the testing phase. In the testing phase, first the test corpus is preprocessed and then parsed each sentence in the test corpus using a dependency parser. After that extracted feature vector for each noun pair in the test corpus. The classifier is used to determine whether the noun pair exhibits the semantic relation or not.

5. Experimental Results

We use precision, recall, and F-score as our metrics to evaluate the performances of the method. Precision and recall are defined as follows:

Number of correctly retrieved relations ...

Precision =----(4)

Number of relations retrieved

Number of correctly retrieved relations

Recall =----(5)

Number of correct relations

Fig. 4. System Architecture - Phase2.

Fscore is the harmonic mean of recall and precision. ^ „ Precision x Recall

F-score = 2 x--(6)

Precision + Recall

Among the test set of 310 noun pairs, human annotator agreed upon 27 nounpairs as being in a hypernym relationship and 19 noun pairs as being meronym relationship. But after testing the system gives 29 noun pairs as hypernym pairs and 21 noun pairs as meronym pairs. In these 29 noun pairs only 21 noun pairs hold the hypernym relationship and in the 21 noun pairs only 14 hold meronym relationship. The precision, recall and F-score of the test corpus is shown in table 6.

Table 6. Precision, Recall and F-score of the test corpus

Relation Precision Recall F-Score

Hypernym 0.7241 0.7778 0.7500

Meronym 0.6667 0.7368 0.7000

Table 7 and 8 compares the F-scores of the Wordnet and other best classifiers for hyernym and Meronym relations

respectively.

Table 7. F-scores of Hypernym classifiers

ADTree classifier 0.7500

Best Hypernym Classifier in (Snow et al., 2005) 0.3592

Hearst Patterns Classifier 0.1417

WordNet 0.2312

Table 8. F-scores of Meronym classifiers

Naive Bayes Classifier 0.7000

Quinlans C4.5 algorithm (ISS systm)(Girju et.al.) 0.8094

Wordnet Meronymy 0.5116

6. Conclusion

A relation extraction approach is introduced based on dependency parsing and machine learning to identify hy-pernym and meronym relation among the nouns in sentences in a natural language text. Unlike syntactic parsing,

dependency parsing captures the semantic predicate argument relationships between the entities in addition to the syntactic relationships. The shortest triplet paths between the noun pairs in the dependency parse trees of the sentences and also the satellite link for each noun in the noun pair are extracted and used as the feature vector for the classification. Generalization and extension of the dependency path gives more accurate features for the semantic relation extraction. Supervised machine learning approaches like ADTree and Naive Bayes classifier have been applied to this domain. This work can generalize and perform better than WordNet because the classifier can infer the appropriate classification for very specific, domain limited terms (eg: proper nouns) that have not been in the Wordnet. We evaluated and compared the performance of our classifier with the previous approaches. This work achieved the best F-score performance with ADTree classifier (75%) for hypernym relation and NaiveBayes classifier (70%) for meronym relation. F-score of hypernym classifiers is better than the results obtained in previous work, but our results at the extraction of meronym semantic relation were not as good as we were expecting. We report our experimental results comparing to previous work and the result shows the power of the dependency path to extract semantic relations from a text.

References

1. Matthew Berland and Eugene Charniak. Finding parts in very large corpora. Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, 1910(c):57—64, 1999.

2. Sharon A Caraballo. Automatic construction of a hypernym-labeled noun hierarchy from text. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pages 120-126. Association for Computational Linguistics, 1999.

3. Scott Cederberg and Dominic Widdows. Using lsa and noun coordination information to improve the precision and recall of automatic hyponymy extraction. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, pages 111— 118. Association for Computational Linguistics, 2003.

4. Marie-Catherine De Marneffe, Bill MacCartney, and Christopher D. Manning. Generating typed dependency parses from phrase structure parses. Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), pages 449—454, 2006.

5. Marie-Catherine De Marneffe and Christopher D Manning. The stanford typed dependencies representation. In Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation, pages 1—8. Association for Computational Linguistics, 2008.

6. Christiane Fellbaum. WordNet. Wiley Online Library, 1998.

7. Michael Fleischman, Eduard Hovy, and Abdessamad Echihabi. Offline strategies for online question answering. Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - ACL '03, 1:1—7, 2003.

8. Yoav Freund and Llew Mason. The alternating decision tree learning algorithm. In icml, volume 99, pages 124—133, 1999.

9. Ruiji Fu, Jiang Guo, Bing Qin, Wanxiang Che, Haifeng Wang, and Ting Liu. Learning semantic hierarchies via word embeddings. In ACL (1), pages 1199—1209,2014.

10. Roxana Girju, Adriana Badulescu, and Dan Moldovan. Learning semantic constraints for the automatic discovery of part-whole relations. In

Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pages 1—8. Association for Computational Linguistics, 2003.

11. Marti a. Hearst and Marti a. Hearst. Automated discovery of wordnet relations. WordNet: an electronic lexical database, pages 131—152, 1998.

12. Marti A. Heart. Automatic Acquisition of Hyponyms from Large Text Corpora. In 14 th International Conference on Computational Linguistics, pages 23—28, 1992.

13. George H John and Pat Langley. Estimating continuous distributions in bayesian classifiers. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 338—345. Morgan Kaufmann Publishers Inc., 1995.

14. Dekang Lin. Dependency-Based Evaluation of Minipar. Treebanks - Building and Using Parsed Corpora, pages 317—329, 2003.

15. Gideon S. Mann. Fine-grained proper noun ontologies for question answering. Proceedings of the 2002 workshop on Building and using semantic networks (SEMANET '02), 11(Section4):1—7, 2002.

16. Emmanuel Morin and Christian Jacquemin. Automatic acquisition and expansion of hypernym links. Computers and the Humanities, 38(4):363—396, 2004.

17. Alan Ritter, Stephen Soderland, and Oren Etzioni. What is this, anyway: Automatic hypernym discovery. In AAAI Spring Symposium: Learning by Reading and Learning to Read, pages 88—93, 2009.

18. Barbara Rosario and Marti Hearst. Classifying the semantic relations in noun compounds via a domain-specific lexical hierarchy. In Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing (EMNLP-01), pages 82—90, 2001.

19. Rion Snow, Daniel Jurafsky, and Andrew Y. Ng. Learning syntactic patterns for automatic hypernym discovery. Advances in Neural Information Processing Systems 17, 17:1297—1304, 2004.

20. Rion Snow, Daniel Jurafsky, and Andrew Y Ng. Semantic Taxonomy Induction from Heterogenous Evidence. ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, (July):801—808, 2006.