Scholarly article on topic 'Overcoming Problems in Automated Appraisal Recognition: The Attitude System in Inscribed Appraisal'

Overcoming Problems in Automated Appraisal Recognition: The Attitude System in Inscribed Appraisal Academic research paper on "Computer and information sciences"

CC BY-NC-ND
0
0
Share paper
Keywords
{"sentiment analysis" / appraisal / NLP / recognizer / "computational linguistics."}

Abstract of research paper on Computer and information sciences, author of scientific article — Fiorella Carla Dotti

Abstract Since Appraisal annotation typically requires manual annotators and is time-intensive, the amount of available Appraisal- annotated corpora is limited. While widespread success has been achieved in the area of sentiment analysis in regards to the overall semantic orientation of a text, the Attitude Appraisal subsystem is still a last stand. For this study, a basic automatic recognizer was programmed and tested in order to identify problem areas and provide clues as to their possible solutions. It deals exclusively with inscribed Appraisal and does not distinguish between authorial or non-authorial evaluation.

Academic research paper on topic "Overcoming Problems in Automated Appraisal Recognition: The Attitude System in Inscribed Appraisal"

Available online at www.sciencedirect.com

ScienceDirect

Procedía

Social and Behavioral Sciences

ELSEVIER Procedía - Social and Behavioral Sciences 95 (2013) 442 - 446

5th International Conference on Corpus Linguistics (CILC2013)

Overcoming Problems in Automated Appraisal Recognition: the Attitude System in Inscribed Appraisal

Since Appraisal annotation typically requires manual annotators and is time-intensive, the amount of available Appraisal-annotated corpora is limited. While widespread success has been achieved in the area of sentiment analysis in regards to the overall semantic orientation of a text, the Attitude Appraisal subsystem is still a last stand. For this study, a basic automatic recognizer was programmed and tested in order to identify problem areas and provide clues as to their possible solutions. It deals exclusively with inscribed Appraisal and does not distinguish between authorial or non-authorial evaluation.

© 2013TheAuthors. PublishedbyElsevierLtd. Selectionandpeer-review underresponsibilityofCILC2013.

Keywords: sentiment analysis; appraisal; NLP; recognizer; computational linguistics.

1. Introduction

Appraisal theory (Martin and White, 2005) was developed as part of a literacy program. It allows us to analyze the ways in which things, behaviors or people are evaluated and how writers and speakers position themselves in the text. Annotating a text in terms of Appraisal is not synonymous with finding its overall semantic orientation, since Appraisal tries to deal with the finer details. The fact that Appraisal can be inscribed (explicit) or invoked (implicit), along with its polymorphous nature, make automatic annotation a difficult task. This study deals only with inscribed Appraisal, and only with the Attitude system (Engagement and Graduation are left outside of its scope). For a clearer vision of the Appraisal system, please refer to Appendix A.

* Corresponding author. Tel.: +34-617-119-894. E-mail address: fiorella.dotti@estudiante.uam.es

Fiorella Carla Dotti*

Autonomous University of Madrid, C/Einstein 5, Madrid 28049, Spain

Abstract

1877-0428 © 2013 The Authors. Published by Elsevier Ltd. Selection and peer-review under responsibility of CILC2013. doi: 10.1016/j.sbspro.2013.10.667

The fact that manual annotators are required limits the amount of available Appraisal-annotated corpora. Using a small amount of ready-made corpora presents problems, as pointed out by Lindquist and Levin: "In the real world,

for economic or copyright reasons, only a limited number of corpora will be available to any individual scholar. Unfortunately, this sometimes leads to research being carried out on less than optimally suitable material, material which is insufficient or skewed in a particular direction and thus not representative of the type of language which is meant to be under investigation."(Lindquist & Levin, in Mair and Hunt, 2000).

It would be difficult to use most existing software, except that developed by Sano (2011) and to some extent, Garg et al (2006), for automatic Appraisal analysis. Other software were developed with a different goal: to extract the overall sentiment of a text, most often for commercial uses. It is not their goal to try to identify all tokens or divide them in more detailed categories that are equivalent to those used in Appraisal, even when some of them make use of Appraisal theory to some extent.

The fact that Appraisal was developed as part of research carried out in the framework of a literacy program and that it deals with the way in which speakers engage their audience and position themselves, a hard terrain to navigate for most foreign language learners, means that Appraisal could be an useful tool in SLA.

For this study, I set out to develop a basic automatic Appraisal recognizer, with no disambiguation strategies whatsoever, in order to identify a baseline value and reveal the most common kind of errors that such a recognizer would encounter.

2. Method

In order to train the recognizer, a dictionary is necessary. Although it is possible to use a web-based dictionary, I decided against it because of problems found in previous research, most notably Taboada et al (2009): "...although usable, dictionaries created using the Google search engine were unstable. When rerun, the results for each word were subject to change, sometimes by extreme amounts, something that Kilgarriff (2007) also notes, arguing against the use of Google for linguistic research of this type."

Thus, I decided to compile a small training corpus. News articles concerning financial and technological companies were downloaded in plain text format from the web version of the following English-language newspapers: The New York Times, The Washington Post, LA Times and The Chicago Tribune. No HTML code or other artifacts were left on the text. A training corpus, consisting of 32 articles was selected. 26 extra articles (13 on finance and technology, and 13 from general news) were set apart for testing purposes. The articles were loaded in a new project in UAM CorpusTool (O'Donnell, 2008) and manually annotated according to a modified version of the Appraisal_Max scheme that only takes into account the Attitude subsystem. For a complete version of this scheme, see Figure 2. Annotation was done following the guidelines in The Language of Evaluation: Appraisal in English (Martin and White, 2005). Invoked Appraisal was ignored.

All Appraisal tokens were extracted from this small corpus and loaded into different lists according to the Appraisal system and subsystem they belonged to, as well as their polarity, e.g., the word "angry" was added to the corresponding "Attitude:Affect:Dis-Satisfaction:Dis-Pleasure:Negative" list, the word "pain" was added to the "Attitude:Affect:Un-Happiness:Misery-Cheer:Negative" list, and so on. When there was more than one possible option, the majority sense of the word was kept.

The dictionary was enriched with Appraisal terms generously provided to me by another researcher. A small program was created that performed the following functions:

• Load the lexicon from the files.

• Prompt the user to insert the text that they wanted tagged.

• Break the text down into tokens, filtering out punctuation marks and converting to lower case. Load the lexicon from the files.

• Match each token against the dictionary to see if an entry for that token exists.

• Save the text in an output file, inserting a tag for each recognized token. The tags cover 14 categories, according to type and polarity.

The recognizer has no disambiguation strategies whatsoever and makes no use of context. It is also unable to handle multi-word expressions.

In order to eliminate any interference due to inter-annotator inconsistency, a problem that Read et al. (2007) pointed out in regards to Appraisal theory, all manually annotated texts were annotated by myself.

The tagged texts were tested against manual annotation of the same texts in terms of precision and recall.

3. Results

The recognizer had a precision of 52.97% and its recall was of 26.22%. The F-score was 35.08% It correctly recognized 107 of the 202 total recognized tokens, making mistakes in 95 cases. A complete detail of errors can be found in Table 7. Most of the incorrectly recognized tokens were false positives. This was expected because the program had no disambiguation modules or any other tool providing information about context. One of the most common words that created problems in this error category was "just", since the recognizer had no way of knowing when it was used to describe the character of an individual or when it was used in a different sense. Problems were identified in dealing with negation, since the recognizer is unable to handle multi-word expressions or use POS tagging as of today, which led to polarity errors. Errors in terms of type but not in polarity were also present and are due to the lack of knowledge about the appraiser and the appraised.

Other errors were due to three main reasons: the term could not be found in the dictionary, the term could be found in the dictionary but a different inflection was used, or the term could be found in the dictionary but it was used in a different sense.

Possible solutions include expanding the training corpus, using lemmatization in order to solve those instances in which a different inflection was used, handling multi-word expressions, making use of a POS tagger output and using a dictionary of collocations.

Table 1. Results (Attitude type)

Feature Manual annotation Recognizer Relative frequency

Total tokens (Attitude) 408 113 27.70%

Affect 67 14 20.90%

Judgement 69 10 14.49%

Appreciation 272 89 32.72%

Table 2. Detailed results - Affect (Authorial evaluation and classification)

Feature Manual annotation Recognizer Relative Frequency

Authorial evaluation 61 14 22.95%

Non-authorial evaluation 6 0 0.00%

Un/happiness 14 5 35.71%

Dis/satisfaction 16 3 18.75%

In/security 14 3 21.43%

Dis/inclination 23 3 13.04%

Table 3. Detailed results - Affect (subclassification)

Feature Manual annotation Recognizer Relative Frequency

Misery/cheer 12 4 33.33%

Antipathy/affection 2 1 50.00%

Ennui/interest 5 0 0.00%

Dis/pleasure 11 3 27.27%

Dis/quiet 10 1 10.00%

Dis/trust 4 2 50.00%

Table 4. Detailed results — Judgement

Feature Manual annotation Recognizer Relative Frequency

Normality 8 1 12.50%

Capacity 24 5 20.83%

Tenacity 7 2 28.57%

Propriety 22 2 9.09%

Veracity 6 0 0.00%

Unclear 2 0 0.00%

Table 5. Detailed results —Appreciation

Feature Manual annotation Recognizer Relative Frequency

Reaction (impact) 13 7 53.85%

Reaction (quality) 5 4 80.00%

Composition (balance) 10 1 10.00%

Composition (complexity) 41 12 29.27%

Social valuation 203 65 32.02%

Table 6. Results —Polarity

Feature Manual annotation Recognizer Relative Frequency

Positive attitude 256 86 33.59%

Negative attitude 152 27 17.76%

Ambiguous 0 0 0.00%

Table 7. Recognizer — Errors

Description Percentage Number of tokens

Total 100% 202

Correctly tagged 52.97% 107

Incorrect type 7.43% 15

Incorrect polarity 3.47% 7

Incorrect type and polarity 2.48% 5

False positives 33.66% 68

4. Conclusion

Though its current state makes it very limited, the creation of an automated Appraisal recognizer insofar as inscribed attitude is concerned is feasible. The use of disambiguation techniques and POS tagger output will probably improve the overall recall of such a recognizer, as would the other methods recommended in this paper. Different texts will probably require different techniques, though I agree with Wang and Manning (2012) in their opinion that NBSVMs are robust and adapt to most text types.

However, for an appraisal recognizer, it would be interesting to include the variations of each word, even if they are misspelled, since the texts that are used in corpus linguistics are produced by different types of users, and if we

were to analyze in Appraisal terms a corpus of texts produced by SLA students, it is very likely that some words would be misspelled, as it would happen if we were to rely fully on a web corpus.

Acknowledgements

I would like to thank Dr. Mick O'Donnell and Dr. Maite Taboada for their generous contributions. Appendix A. Scheme used for annotation

r affect

ATTITUDE-

APPRAISAL-appraisal type-— attitude

judgement

AFFECT^ authorial-ev aluatlon

TYPE Lnon-authorlal-ev aluatlon

UN/HAPPINESS-r misery /cheer

-un/happiness TYPE-

IYPE ^antipathy/affection

DIS/SATISFACTION-rennul/lnterest

AFFECT-~dis/satisfaction TYPE TYPE2

-dis/pleasure

-in/security

IN/SECURITY-rdis/quiet

JUDGEMENT

TYPE dls/lncllnatlon p normality capacity tenacity propriety -veracity - unclear

^ dis/trust

-reaction

REACTION^ impact

APPRECIATION-appreciation ^ype--

TYPE U quality

COMPOSITION-- balance

^complexity

composition type social-valuation

EXPLICITNESSr inscribed

ATTITUDE-POLARITY

invoked p positiv e-attitude negativ e-attitude ambiguous

References

Garg, S., Bloom, K. & Argamon, S. (2006). Appraisal navigator. Proceedings of the SIGIR '06 29th annual international ACM SIGIR conference on research and development in information retrieval (pp.727-727). New York: Association for Computer Machinery (ACM).

Lindquist, H., Levin, M. (2000). Apples and oranges: on comparing data from different corpora. In Mair, C. and Hundt, M. (2000). Corpus

linguistics and linguistic theory. Papers from the Twentieth International Conference on English Language and Research on Computerized Corpora (ICAME 20). Freiburg im Breisgau. Amsterdam: Rodopi.

Martin, J.R. and White, P.R.R. (2005). The Language of Evaluation: Appraisal in English. New York: Palgrave.

O'Donnell, M. (2008). Demonstration of the UAM CorpusTool for text and image annotation. Proceedings of the ACL-08: HLT Demo Session (pp.13-16) (Companion Volume). Columbus, Ohio, June 2008. Association for Computational Linguistics (ACL).

Sano, M. (2011). Reconstructing English system of attitude for the application to Japanese: an exploration for the construction of a Japanese dictionary of appraisal. Paper presented at the 38th International Systemic Fuctional Congress, 25-29 July, University of Lisbon, Portugal.

Taboada, M., Brooke, J., Tofiloski, M., Voll, K. and Stede, M. (2009). Lexicon—based methods for sentiment analysis. Computational Lingustics (pp.267-308). Volumen 37, Number 2. Massachussetts: MIT Press.

Wang, S. and Manning, C. (2012). Baselines and bigrams: simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL 2012). Pennsylvania: Association for Computational Linguistics (ACL).