Scholarly article on topic 'Identifying Polarity in Financial Texts for Sentiment Analysis: A Corpus-based Approach'

Identifying Polarity in Financial Texts for Sentiment Analysis: A Corpus-based Approach Academic research paper on "Computer and information sciences"

CC BY-NC-ND
0
0
Share paper
Keywords
{"sentiment analysis" / "information retrieval" / terminology / "specialized languages" / "financial texts" / "corpus linguistics"}

Abstract of research paper on Computer and information sciences, author of scientific article — Antonio Moreno-Ortiz, Javier Fernández-Cruz

Abstract In this paper we describe our methodology to integrate domain-specific sentiment analysis in a lexicon-based system initially designed for general language texts. Our approach to dealing with specialized domains is based on the idea of “plug-in” lexical resources which can be applied on demand. A simple 3-step model based on the weirdness ratio measure is proposed to extract candidate terms from specialized corpora, which are then matched against our existing general-language polarity database to obtain sentiment-bearing words whose polarity is domain-specific.

Academic research paper on topic "Identifying Polarity in Financial Texts for Sentiment Analysis: A Corpus-based Approach"

CrossMark

Available online at www.sciencedirect.com

ScienceDirect

Procedía - Social and Behavioral Sciences 198 (2015) 330 - 338

7th International Conference on Corpus Linguistics: Current Work in Corpus Linguistics: Working with Traditionally-conceived Corpora and Beyond (CILC 2015)

Identifying polarity in financial texts for sentiment analysis: a

corpus-based approach

Antonio Moreno-Ortiza*, Javier Fernández-Cruzb

abUniversidad de Málaga, Facultad de Filosofía y Letras, Málaga 29071, Spain b Pontificia Universidad Católica del Ecuador Sede Esmeraldas, Escuela de Lingüística Aplicada, Esmeraldas, Ecuador

Abstract

In this paper we describe our methodology to integrate domain-specific sentiment analysis in a lexicon-based system initially designed for general language texts. Our approach to dealing with specialized domains is based on the idea of "plug-in" lexical resources which can be applied on demand. A simple 3-step model based on the weirdness ratio measure is proposed to extract candidate terms from specialized corpora, which are then matched against our existing general-language polarity database to obtain sentiment-bearing words whose polarity is domain-specific.

© 2015The Authors. Published by ElsevierLtd.This is an open access article under the CC BY-NC-ND license (http://creativecommons.Org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of Universidad de Valladolid, Facultad de Comercio.

Keywords: sentiment analysis; information retrieval; terminology; specialized languages; financial texts; corpus linguistics

1. Introduction

In the last two decades, sentiment analysis or opinion mining has become an increasingly relevant sub-field within text analytics that deals with the computational treatment of opinion and subjectivity in texts. Most Sentiment Analysis systems have focused on specialized domains using domain-specific corpora as training data for machine learning algorithms that classify an input text as either positive or negative. Other systems are lexicon-based, where sentiment-bearing words and phrases are collected and then searched for during analysis to come up with a certain sentiment index. In this paper we describe our methodology to integrate domain-specific sentiment analysis in a lexicon-based system initially designed for general language texts. Our system has shown reasonably good results

* Corresponding author. Tel.: +34-9521336670; fax: +34-952131788. E-mail address: amo@uma.es

T This work has been sponsored by the Spanish Government under grant FFI2011-25893 (Lingmotif project, http://tecnolengua.uma.es/lingmotif).

1877-0428 © 2015 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of Universidad de Valladolid, Facultad de Comercio. doi:10.1016/j.sbspro.2015.07.451

across different types of texts, but falls short as the specialization level increases, since sentiment is lexicalized differently to some extent.

As Das and Bandyopadhyay (2011) explain, despite the formidable body of research generated on the topic of sentiment analysis, available systems are still far from perfect. All systems fall short of accounting for the incredible complexity that speakers of a language are able to express. Of course this is true of all computational tools that deal with natural language semantics and pragmatics. This, however, should only spur us on to make our systems as effective as possible.

Our approach to dealing with specialized domains is based on the idea of "plug-in" lexical resources which can be applied on demand. In order to acquire such resources we employ a simple 3-step model based on the weirdness ratio measure to extract candidate terms from specialized corpora, which are then matched against our existing general-language polarity database to obtain sentiment-bearing words whose polarity is domain-specific.

2. Lexicon-based Sentiment Analysis

The Tecnolengua group started with the development of Sentitext, a linguistically-motivated sentiment analysis system for Spanish (Moreno-Ortiz et al. 2010a, 2010b, 2011) and evolved within the Lingmotif project to integrate English, detecting languages automatically. Lingmotif is based on the same principles as Sentitext: a reliance on a comprehensive set of lexical resources rather than a complex set of algorithms.

The analysis process can be briefly outlined as follows:

1. The input text is preprocessed, tokenized, lemmatized, and part-of-speech tagged. Multiword expressions are identified and tagged, too.

2. Lexical words and MWE's are looked up in the sentiment lexicons. If found, they are assigned the corresponding valence.

3. Context rules are searched for each lexical word or MWE. Matching segments are assigned the valence resulting from the application of the context rule.

4. Affect intensity (i.e., the proportion of sentiment-carrying vs. neutral units) is calculated.

5. The final Global Sentiment Value (GSV) is calculated.

Describing each of these steps in detail falls outside the scope of this paper; details can be found in Moreno-Ortiz et al (2010ab, 2011, 2013). It is clear, however, that the system's performance relies on high quality, wide coverage lexical resources. Like Sentitext, Lingmotif uses three major linguistic data sources for each language: the individual words dictionary, the multiword expressions dictionary, and the context rules set, which is our implementation of Contextual Valence Shifters. The individual words dictionary currently contains over 14,000 items, all of which are labeled for valence, which were obtained semi-automatically from several lexical databases and subsequently manually refined using corpora. Lexical items in both dictionaries in our database were assigned a valence marking their orientation and degree (from -2 to 2). The Lingmotif multiword expressions lexicon contains over 20,000 entries. Unlike the single words dictionary, this one contains also neutral-polarity items, which are needed in order to block polarity-laden words that are part of them (see Moreno-Ortiz et al., 2013) for a full discussion of this.

The final key component of our system is the Context Rules database. Simply accounting for negative and positive words and phrases found in a text would not be enough. There are two ways in which their valence can be modified by the immediately surrounding context: the valence can change in degree (intensification or downtoning), or it may be inverted altogether. Negation is the simplest case of valence inversion. The idea of Contextual Valence Shifters (CVS) was first introduced by Polanyi and Zaenen (2006), and implemented for English by Andreevskaia and Bergler (2007) in their CLaC System, and by Taboada et al. (2011) in their Semantic Orientation CALculator

(SO-CAL). To our knowledge, apart from Brooke et al.'s (2009) adaptation of the SO-CAL system, Sentitext is the only sentiment analysis system to implement CVS for Spanish natively.

Context rules are applied after polarity words and multiword expressions have been identified; they account for the many ways in which these can be modified by the immediate context to either intensify their valence or invert it. Our system allows us to define fairly elaborate context rules. For instance, having multiword modifiers such as those in (1) and (2) below.

(1) (be) no + negative adjective "He's no fool"

(2) (be) a total + negative adjective "She's a total loser"

A context rule for type (1) constructions would cause the polarity of the negative adjective to be inverted, whereas a rule for type (2) constructions would intensify the valence of the negative adjective.

2.1. Domain-dependent Sentiment Analysis

Semantic orientation has been shown to be subject-dependent to a large extent, which is why most Sentiment Analysis research has focused on particular domains for which their ML algorithms are trained. The problem, then is adapting, or customizing a classifier built for a certain domain to a new specialized domain. Different approaches have been proposed to tackle this customization process with different degrees of success. Aue and Gamon (2005) distinguish four different approaches to overcome this issue in document-level classifiers, whereas Choi, Kim and Myaeng (2009) propose a clue-discovery approach for dealing with multiple domains in a sentence-level classifier.

The limitations of Sentitext for domain-specific texts were evidenced and discussed in Moreno-Ortiz et al. (2010), where a set of hotel reviews from the Tripadvisor website were analyzed. Within this domain, certain qualities keep the orientation they show in general-language texts, e.g., cleanliness (of rooms), character (of staff), whereas others acquire a specific orientation, e.g., size (of rooms, beds, etc.).

One simple approach is to tweak the lexical resources, for example by introducing certain recurrent phrases (e.g., "small beds"), but obviously this cannot account for all subject domain issues. The solution provided by Lingmotif is the introduction of specialized lexical resources for subject domains, implemented as plug-ins that can be used optionally. Since our system works at the document level, i.e., it offers an overall sentiment index for the text as a whole, this approach is appropriate. At present the user selects which domain plug-in to use, but text classification techniques could be integrated to do this automatically.

3. Semantic ambiguity vs. polarity

The issue of ambiguity in specialized domains has been extensively dealt with in terminology studies, where it has been a topic of debate. From a traditional perspective (e.g., Wüster), the main task of terminologists is to assign a certain term to a concept in a concept system, unequivocally. As Cabré (1992) points out "in terminology (...) the absence of ambiguity and the single reference of a term to a concept are essential elements for effective communication" (p. 40). The existence of such unequivocal designation in specialized languages is highly questionable, however, as has been pointed out by many researchers.

Such discussions fall beyond the scope of Natural Language Processing, where a strictly pragmatic approach is usually employed. In Sentiment Analysis, ambiguity is only relevant when the different senses, if present, differ in polarity. Whether a certain lexical unit has a specialized meaning or not is irrelevant as long as the senses share the same polarity.

For example, the word "meager" is identified as a specialized term in the realm of finances, since it is very commonly used in expressions such as "meager economic recovery", or "meager 10%". However, there is no need to account for this adjective in our specialized lexicon for financial texts, because it is already contained in the general polarity lexicon, and marked as negative, since its connotations are always so: "meager resources / salaries / challenges / advantages".

Our plug-in approach works by overriding the default polarity of lexical items (those found in general language) with the one assigned to those lexical items when they are used in specialized domains. From a lexicology/terminology viewpoint, this may appear as an overly simplistic strategy, but from the language engineering perspective, it is both effective and efficient, as we only need to deal with those particular cases for which polarity differs. A typical example would be the words "growth" and "expansion", which, in the financial domain, are always positive, whereas in general language their polarity is context-dependent: "the expansion of the universe / water / population". The need to have separate, domain-specific lexicons is apparent from words like "growth", which in medicine usually refers to a tumor (whether benign or malignant) and in general language is, again, highly context-dependent.

In our system context dependency is dealt with by combining multiword expressions and context rules, both derived from actual textual data, i.e., corpora. Examples of how this is done are provided in section 6 below.

4. Single-word terminology extraction

Terminology extraction has been an active field of research for many years. Both statistical and linguistic approaches, or a combination of both, i.e. hybrid approaches, have been employed. From a computational perspective, terminology extraction is one of the most relevant topics, as it deals with the automatic recognition of the different units that compose a specific-domain text. Technologies such as ontology learning or the Semantic Web rely on information extracted automatically from different corpora by using tools that detect terms and term relations (Estopá, 2000; Pazienza, Pennachiotti and Zanzotto, 2005; Cabré, Estopá, and Vivaldi, 2001).

Research on terminology extraction has been approached through any of the aforementioned strategies. Statistical measures have been proposed in order to determine the degree of termhood of candidate terms. On the other hand, terminologists who follow the linguistic approach detect termhood by exploring its linguistic properties and using traditional linguistic techniques in order to filter term patterns. Finally, hybrid approaches employ both linguistic and statistical hints to extract terminology. All these systems have in common the analysis of specialized corpora and the extraction of lists of words that require to be double-checked manually (Pazienza et al., 2005; Cabré, et al., 2001).

The linguistic approach for term extraction is systematized by Pazienza, et al. (2005 p. 257-258) in three steps:

1. Discard text sequences unlikely to contain terms by using frontier markers, especially verb and pronoun phrases.

2. Apply pre-programmed rules with a syntactic parser, as English terminology is generally based on Noun-Noun,

Adjective-Noun and Adverbial structures.

3. Filter the results by adding a stop-list with non-desired words.

Pure statistical approaches are generally applied to detect peculiarities and, thus, extracting candidate terms according to a pre-established criteria set to filter true and false candidates. In general terms, statistical extractors contrast the frequencies of tokens of a specific-domain and a general-domain corpus, and different types of statistical metrics are mixed with heuristic techniques, namely:

• The degree of association, to determine an estimation of the possibility of two adjacent words being a collocation (i.e.: MI and Dice Factor).

• The significance of association, which is aimed at avoiding errors of estimation (i.e.: Z-Score, T-Score, X and Log-Likehood Ratio).

• Heuristic metrics, which are based on empirical and intuitive assumptions that frequently lack statistical justification (Pazienza, et al., 2005 p. 258-262; Cabré, et al., 2001 p. 54).

Hybrid approaches combine techniques from both linguistic and statistical approaches. For instance, the results obtained from a certain linguistic approach extraction are refined by using statistical measures. Final results are considered more reliable, as results extracted from a linguistic approach are ranked and classified according to the definitions of termhood and unithood (Pazienza et al., 2005 p. 271-273).

At present, our work focuses on single-word terminological units. Thus, pure statistical approaches are appropriate and have offered good extraction results. In order to build our extractor, we have developed a three-step model based on the algorithm developed by Gillam and Ahmad (2006), which is based on comparing the weirdness ratio (R) of lexical words found in the sample specialized and general language corpora. Specific-domain words are identified by a higher weirdness ratio in the specialized corpus than the one they have in general language corpus.

5. Method

As our specialized language corpus, we have used the "Mag-Finance" and "News-Money" sections of the Corpus of Contemporary American English (COCA) (Davies, 2008-), which comprises approximately 7.97 million words. Our general language corpus was the Corpus of Global Web-Based English (GloWbE) (Davies, 2013), of approximately 1,900 million words.

As a model of reference, we selected two texts from the economic news section. Sample A is titled "Spanish Unemployment Stays At Elevated Level Despite Economic Recovery". Sample B is titled "Spain's Unemployment Rate Inches Up, Checking Talk of Recovery" (see Appendix A for full references). Both texts date from January 2014 and deal with how the end of the recession provides better macroeconomic results in general despite high unemployment figures. The reasons for selecting this samples were the following:

• Similar extension: the global evaluation algorithm of Sentitext takes into account, along with the number of sentimentally-loaded lexical items, the total number of words.

• Written by native speakers: the use of expressions may vary between native and non-native authors.

• Same topic in order to analyze how two texts dealing with the same topic can use different points of view which may produce a totally different semantic orientation.

Quantitatively, the samples are defined by the following data:

• Sample A. Types: 210 Tokens: 384 TTR: 0.546875

• Sample B: Types: 233 Tokens: 475 TTR: 0.490526

As for the extraction algorithm, we employed the original proposal, which proceeds as follows:

1. Generate lemmatized word lists and frequencies from both the specialized language (SL) corpus and the general language (GL) corpus.

2. Apply stop list to remove any non lexical words.

3. Calculate relative frequencies and weirdness ratio for each lemma:

fGL{word)

4. For each word, calculate the variance of its weirdness ratio and its relative frequency in the specific language corpus:

f\ word) = var(jfl( word.),f{ word)) (2)

5. Decide the variance threshold to delimit terms and non-terms. This threshold can be decided in advance, but in our case it was established, arbitrarily, at 3.6, since this appeared to offer the best balance between precision and recall.

It is important to remark at this point that we are not interested in extracting all terms, only those that indicate positivity or negativity within the particular domain, and, more importantly only when their orientation differs from the one they exhibit in general language or other domains. From this perspective, terms such as analyst, sale or investor are irrelevant to us, since they are neutral. Similarly, polarity-laden terms that are already present in the general language polarity database do not need to be accounted for (e.g. recovery, unemployment).

These aspects can only be resolved by qualitative analysis of the results thrown by quantitative approach we just described. Our procedure to identify relevant terms is, thus, the following:

1. Check the semantic orientation of each candidate term by analyzing them in context.

2. Discard neutral terms, i.e., those whose meaning does not convey any particular semantic orientation.

3. Match the list of polarized terms against our existing list of polarized words.

4. Discard terms whose polarity matches (both in orientation and intensity) our existing general-language words.

5. Approve the remaining terms as specialized lexicon items.

6. Results and discussion

Applying this procedure we obtained a list of 100 term candidates from two sample texts (see Appendix A). The algorithm succeeded to detect 84 true positive terms, returned 15 false positives.

True positives (83):

analyst decline (n) [-] forecast (v) market (n) service (v)

bailout [-] demand fuel (v) [+] meager [+] share (n)

bank (n) domestic gain [+] net (j) shrink [-]

bond drop (v) [+/-] gross office signal (v)

boost [+] economics grow [+] percent slow [-]

bubble [-] economist growth [+] president slump [-]

capital economy hold private spending (n)

carryover [-] estimate (n) import (n) product take (v)

cautious [-] estimate (v) inch (v) [+/-] quarter tight [-]

charge (v) [-] expand (v) [+] increase (v) [+/-] rate (n) unemployment [-

commission expansion [+] industry recent vice (n)

consolidation [+] expect (v) inflation [-] recession [-] wane [-]

construction exports investment recovery [+] workforce

contract (v) fall (v) [+/-] investor (n) reduction [+/-] year

crisis [-] finance job (n) reform (v) [+] yield (v)

current financial jobless [-] rise [+/-]

debt [-] flat (j) [-] long-term sale

False positives (15):

bleak [-] last new push (v) talk (v)

bumper (j) million pace say (v) turnaround

largely month prompt (v) stem (v) union

Further manual analysis determined that 24 terms remained undetected. Thus, the algorithm returned a precision of 84.69 percent and a recall of 77.57 percent. True negatives being 125, we reached very good accuracy: 84.21 percent.

False negatives (24):

economic (j) emerge (v) expectation immigrant (n) objective (n) population

program property result (v) sector activity aftershock I

austerity

consumption

effect

employment fragile [-]

government

household

labourers

official

sector

statistical

statistics

sustainability [+] unsustainable [-]

Next is the abovementioned qualitative analysis. Strictly speaking, we should run a concordance for each candidate term, but this would unnecessarily slow down the acquisition process. Words such as analyst, bond or exports are clearly neutral, whereas words such as crisis, debt, and inflation clearly have a semantic (negative) orientation. We proceed by handpicking those words likely to convey any some semantic orientation, and check our speculation against text data.

This step gives us a number of terms that convey a certain semantic orientation: negative, positive or both, depending on context. We have marked these items in the list above with minus [-], plus [+], or both [+/-].

Now we perform a matching query of these items on our general language lexicon. It is important to note that the matching is not performed solely on the lemma, but on the lemma/part-of-speech combination. Table 1 below summarizes the actual results.

Table 1. Summary of results

Result

Action

Examples

The (affect-laden) candidate term does not exist in the general-language lexicon.

The candidate term is present in the general-language lexicon with the same orientation.

The candidate term is present in the general-language lexicon with different orientation.

The candidate term functions as a sentiment modifier

Term is entered in the specialized sentiment lexicon.

Term is discarded as redundant.

Term is entered in the specialized lexicon with the specialized polarity.

Term is formalized as context rule, if possible

bailout, boost, carryover, expand, expansion, flat, shrink, slow, slump, tight, wane

consolidation, aftershock, crisis, debt, decline, recovery, reform, unemployment, unsustainable, unsustainability

No cases were identified

inch, rise, fall, drop, increase, reduction

Some of the items selected to be added to the specialized affect lexicon might appear surprising. Most readers, now quite familiar with the term "bailout" might strive to think why bailout should not be included in the general language lexicon, since it is bound to always be negative. In fact, this word is synonymous with assistance, which is positive. However, this word has evolved to be unequivocally associated with financial problems, nearly equating it with bankruptcy. Also, in other specialized domains, such as scuba diving, where it refers to a special kind of tank used as air-providing backup when the main system fails, it is definitely a (god-send!) positive concept.

The same kind of specialized semantic orientation can be seen in the verb boost, which, unlike other intensity-changing verbs (see below), always refers to a positive increase (sales, productivity). In the same vein, the words expand and expansion can have any polarity in general language, but are always positive in finance. Grow and growth and tight appear to exhibit exactly the same behavior. And something similar happens with the term carryover, which means "a quality passed on from a previous situation": although in other domains the transfer can be of any kind, in finance, the transferred quality appears to be always negative.

Other words, such as shrink and wane, are found in general language as usually, but not always negative, since they can refer to a change in size without any negative connotations. However, as antonyms of growth, in the financial domain they always refer to unwanted results.

This process is not without its issues, though. Whereas some lexical items clearly belong in the specialized discourse. Others, for example, the identified term recession, widely used in the domain of finance with marked semantic orientation, is also found in the general language lexicon, so there is no need to include it in the specialized lexicon. We have to say that this was the case with many other terms: they were once limited to specialized lexicon, but in recent years, surely due to the global financial situation, they have been increasingly making their way to the general lexicon, as they have been regularly used in general-audience media and given special attention by the general public, who is now quite (and sadly) familiar with terms such as housing bubble or credit crunch.

This raises the question whether such words should be entered in the general or specialized affect lexicons. This is not simply a lexicographical issue, but one that does have an impact on system performance in our case. For example, if the polarized sense of bubble is entered in the general-language lexicon, every single occurrence of it, including those with the physical sense, would be analyzed as negative.

Finally, our procedure allows us to identified not only items for the single words lexicon, but also some that generate context rules. The verb rise is a good example:

(3) He predicts that inflation will rise to an annual rate of 3 or 4 percent in coming years

(4) An equity strategist with Standard and Poor's, thinks the SandP 500 index could rise another 8 percent over the next 12 months.

In these examples rise functions as an orientation intensifier, whichever that is: negative in (3) and positive in (4). A context rule can be constructed to reflect this behavior. Other such verbs (and deverbal nouns) are fall, drop, increase, inch and reduction.

7. Conclusion

Linguistically motivated Sentiment Analysis has its advantages and disadvantages. Among the latter, domain specificity is probably the most obvious issue. As we have shown, words show a certain semantic orientation depending not only on the immediate linguistic co-text, but also on the subject domain. Whereas sentence-level context can be dealt with by means of the introduction of context valence shifters (context rules), text-level valence resolution requires other mechanisms. Our plugin approach, even if manually activated, seems to be appropriate.

As for the method for affect-laden term identification, the combination of well established statistical term extraction methodologies and semi-automatic filtering appears to be well balanced in terms of cost-effectiveness. Even using such as small sample, we have been able to identify a fair number of domain-dependent affect-laden lexical items and context rules. We plan to expand and systematize work in order to cover the financial domain.

As for the treatment of affect modifiers as general, it is questionable whether such affect modifiers are specific of the financial domain. We did consider creating a specific repository for specialized rules, but, for the time being, we

are storing them in the same database. In fact, we believe there may not be a need to separate them into a different pluggable resource, since they all are applicable to general language too. The difference, as with many affect words may be just one of frequency, appearing more often in certain domains than they do in general language.

Finally, our work not only has offered practical results for the acquisition of our lexicons and implementation of our systems, we have also been able to gain insight into specialized languages from a perspective that usually receives no attention in the literature: the expression of emotion, an aspect of language that, as seems apparent, permeates language, and languages.

Appendix A. Samples

Day, P. (2014, January 23). Spain's unemployment rate inches up, checking talk of recovery | Reuters. Reuters UK. Madrid. Retrieved from

http://uk.reuters.com/article/2014/01/23/uk-spain-economy-unemployment-idUKBREA0M0HK20140123 RTT Staff. (2014, January 23). Spanish Unemployment Stays At Elevated Level Despite Economic Recovery. Retrieved February 10, 2014, from http://www.rttnews.com/2255939/spanish-unemployment-stays-at-elevated-level-despite-economic-recovery.aspx

References

Andreevskaia, A., and Bergler, S. (2007). CLaC and CLaC-NB: knowledge-based and corpus-based approaches to sentiment tagging. In

Proceedings of the 4th international workshop on semantic evaluations (pp. 117-120). Prague: Association for Computational Linguistics. Aue, A., and Gamon, M. (2005). Customizing sentiment classifiers to new domains: a case study. Paper presented at the Recent Advances in

Natural Language Processing (RANLP), Borovets, Bulgaria. Brooke, J., Tofiloski, M., and Taboada, M. (2009). Cross-linguistic sentiment analysis: from English to Spanish. In G. Angelova et al. (Eds.),

Proceedings of RANLP 2009, Recent Advances in Natural Language Processing (pp. 50-54). Borovets, Bulgaria. Cabré, M. T. (1992). Terminology, theory, methods, and applications. Barcelona: John Benjamins.

Cabré, M. T., Estopá, R., and Vivaldi, J. (2001). Automatic term detection: a review of current systems. In D. Bourigault, C. Jacquemin, and M.

L'Homme (Eds.), Recent advances in computational terminology (pp. 53-89). John Benjamins. Choi, Y., Kim, Y., and Myaeng, S.-H. (2009). Domain-specific sentiment analysis using contextual feature generation. In Proceeding oof the 1st

international CIKM workshop on topic-sentiment analysis for mass opinion (pp. 37-44). Hong Kong, China: ACM. Das, A., and Bandyopadhyay, S. (2011). Dr Sentiment knows everything! In Proceedings oof the 49th annual meeting of the association for

computational linguistics: human language technologies: systems demonstrations (pp. 50-55). Stroudsburg: Association for Computational Linguistics.

Davies, M. (2008). The corpus of contemporary American English: 450 million words, 1990-present. Retrieved from http://corpus.byu.edu/coca/ Davies, M. (2013). Corpus of global web-based English: 1.9 billion words from speakers in 20 countries. Retrieved from

http://corpus.byu.edu/glowbe/ Estopá, R. (2002). Extracción de terminología: elementos para la construcción de un extractor. Tradterm, 7, 225 - 250. Gillam, L., and Ahmad, K. (n.d.). Financial data tombs and nurseries: A grid-based text and ontological analysis.

Lonsdale, D., McGhee, J., Wood, N., and Anderson, T. (2013). Semantic memory for syntactic disambiguation. In R. L. West, and T. C. Stewart

(Eds.), Proceedings of the 12th international conference on cognitive modeling (pp. 378-383). Ottawa: Carleton University. Moreno-Ortiz, A., Pérez-Hernández, C., and Del-Olmo, M. (2013). Managing multiword expressions in a lexicon-based sentiment analysis system for Spanish. In V. Kordoni, C. Ramisch, and A. Villavicencio (Eds.), Proceedings oof the 9th workshop on Multiword Expressions MWE 2013 (pp. 1-10). Atlanta, Georgia, USA: The Association for Computational Linguistics. Moreno-Ortiz, A., Pérez-Hernández, C., and Hidalgo-García, R. (2011). Domain-neutral, Linguistically-motivated Sentiment Analysis: a

performance evaluation. In M. L. Carrió Pastro, and M. A. Candel Mora (Eds.), Actas del 3o congreso internacional de lingüística de corpus. tecnologías de la información y las comunicaciones: presente y futuro en el análisis de corpus (pp. 847-856). Valencia: Universitat Politécnica de Valéncia.

Moreno-Ortiz, A., Pérez Pozo, Á., and Torres Sánchez, S. (2010). Sentitext: sistema de análisis de sentimiento para el español. Procesamiento de Lenguaje Natural, 45, 297 - 298.

Moreno-Ortiz, A., Pineda Castillo, F., and Hidalgo García, R. (2010). Análisis de valoraciones de usuario de hoteles con Sentitext: un sistema de

análisis de sentimiento independiente del dominio. Procesamiento de Lenguaje Natural, 45, 31 - 39. Polanyi, L., and Zaenen, A. (2006). Contextual Valence Shifters. In J.G. Shanahan, Y. Qu, and J. Wiebe (Eds.), Computing attitude and affect in

text: theory and applications (pp. 1-10). Dordrecht: Springer. Taboada, M., Brooks, J., Tofiloski, M., Voll, K., and Stede, M. (2011). Lexicon-Based Methods for Sentiment Analysis. Computational Linguistics, 37(2), 267 - 307.