Scholarly article on topic 'A Framework to Formulate Customer Taste from Unstructured Review Data'

A Framework to Formulate Customer Taste from Unstructured Review Data Academic research paper on "Computer and information sciences"

CC BY-NC-ND
0
0
Share paper
Academic journal
Procedia Computer Science
OECD Field of science
Keywords
{Taste / "key phrase" / annotation / "online review" / "graph theory" / "recommendation system."}

Abstract of research paper on Computer and information sciences, author of scientific article — Bhaskarjyoti Das, V.R. Prathima

Abstract In the online marketplace, taste brings together customers and businesses. While a business can be viewed as selling products that implement specific tastes, the buying decisions of the customers are also driven by tastes. This paper attempts to model users by formulating customer's taste. A part of the taste is explicit in the online review portal's data but the foot print of taste left behind in unstructured text reviews is implicit. While the explicit part is relatively easy to understand, formulating implicit taste is challenging due to the unstructured nature of the text reviews. In the approach adopted by our work, formulating implicit taste is treated as both an information retrieval and annotation problem. The proposed framework addresses the blind spot in the current techniques of content based recommendation that works well for businesses selling products such as televisions or personal computers but does not work well for business domain such as restaurant with no clearly defined feature set. This framework promises to bring more precision to the current mechanism for marketing, recommendation and community building in such domains. This paper describes the framework and explains a specific use case such as recommendation system deriving value out of it.

Academic research paper on topic "A Framework to Formulate Customer Taste from Unstructured Review Data"

CrossMark

Available online at www.sciencedirect.com

ScienceDirect

Procedia Computer Science 93 (2016) 1046 - 1053

6th International Conference on Advances In Computing & Communications, ICACC 2016, 6-8

September 2016, Cochin, India

A framework to formulate customer taste from unstructured review

Bhaskarjyoti Dasa, Prathima V Rb *

aDept. of Computer Science and Engineering, Rajiv Gandhi Institute of Technology, Bangalore 560032 bDept. of Computer Science and Engineering, Rajiv Gandhi Institute of Technology, Bangalore 560032

Abstract

In the online marketplace, taste brings together customers and businesses. While a business can be viewed as selling products that implement specific tastes, the buying decisions of the customers are also driven by tastes. This paper attempts to model users by formulating customer's taste. A part of the taste is explicit in the online review portal's data but the foot print of taste left behind in unstructured text reviews is implicit. While the explicit part is relatively easy to understand, formulating implicit taste is challenging due to the unstructured nature of the text reviews. In the approach adopted by our work, formulating implicit taste is treated as both an information retrieval and annotation problem. The proposed framework addresses the blind spot in the current techniques of content based recommendation that works well for businesses selling products such as televisions or personal computers but does not work well for business domain such as restaurant with no clearly defined feature set. This framework promises to bring more precision to the current mechanism for marketing, recommendation and community building in such domains. This paper describes the framework and explains a specific use case such as recommendation system deriving value out of it.

© 2016 The Authors. Published by ElsevierB.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.Org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of the Organizing Committee of ICACC 2016 Keywords: Taste; key phrase; annotation; online review; graph theory, recommendation system.

* Corresponding author. Tel.: Tel.: 91 9945007248. E-mail address: bhaskarjyoti01@gmail.com (Bhaskarjyoti Das)

1877-0509 © 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.Org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of the Organizing Committee of ICACC 2016 doi:10.1016/j.procs.2016.07.308

1. Introduction

One can argue that products become successful because they implement tastes successfully and it is taste that gets sold and bought. In an online review forum where there is no obligation to express, a customer willingly expresses his opinion mostly for things that he truly cares about i.e. things that either match or mismatch with what he likes. This is how a customer leaves a footprint of his taste in online reviews. This part of the taste is implicit while the explicit part of it consists of customer preferences made evident by customer's choice of category of business. We need both to understand the customer's tastes. We can also take a similar approach to formulate the tastes that a business is successfully offering. The framework proposed in this paper extracts the explicit taste vector from the online review portal's data and forms the implicit part of the taste vector by mining the textual reviews. Forming the implicit taste vector is challenging due to the unstructured nature of the domain.

The problem that we have described does not have a ready to implement solution. Typically, online review portals for this kind of businesses serve merely as directory of businesses. As a result, though the volume of text reviews is increasing exponentially, the utilization is not. These portals provide some kind of average rating but the limitation of this calculated average is well known. The customers end up researching huge volume of text reviews before making any purchasing decision and still do not feel confident. To address this gap, the portals encourage community building by providing features such as fan or friend that a reviewing customer should utilize. Typically the fraction of customers who use these features is very small compared to the overall number of customers using the review portals. So this whole indirect approach of customer driven community formation is not scaling up enough for the businesses to take advantage of it. A business looking for a solution for any similar requirement, tries to build its own customized solution. Here we propose a more direct approach through formulation of taste.

The other challenge for an academic research in this domain is the fact that it is hard to find an academic dataset from a similar domain. So in our research, we have taken an approach based on prototyping with a real life dataset. The rest of the paper is organized as follows: firstly, we summarize the existing work that we built upon. Secondly, we describe the proposed framework followed by suitable application examples. Finally, we summarize our work and indicate the next steps in our research.

2. Existing work

2.1. User modelling

Our work closely relates to the domain of user modelling and recommendation systems. User modelling refers to the way an application adapts itself based on its understanding about the users. The adaptation can be done in terms of interface and content. User interface modelling is commonly seen in application software. Recommendation systems1 are examples of content based user modelling and are closely related to the concept of user taste. For set of item I and set of users U, the job of the recommendation system is to maximize a utility function p for each user u e U where utility function2 p: I x U ^ O and O is the set of ordered items. Since the utility p is not defined for the whole IxU matrix, job of the recommendation system is to fill up the blank spaces in this matrix by recommending items that the user was not aware of or did not know how to request for. This is also the difference between search and recommendation. The offline part of a recommender is for data pre-processing and building appropriate models whereas online part is to make prediction and final recommendation after considering the current context. The model implementation itself can employ memory based heuristics or machine learning techniques. Models are built primarily based on neighborhood. User based collaborative filtering uses user neighborhood, item based collaborative filtering uses item neighborhood and content based recommendation uses content neighborhood. In a practical recommendation system, hybrid implementation combines multiple methods to overcome weakness of any single method. Researchers extended collaborative recommendation further to social networks. Shaghayegh Sahebi et al3 has shown that collaborative filtering works better in communities which are subsets of all users. Konstas et al has shown that friendship relation4 in the social network can be used in recommendation systems and this can outperform simple collaborative methods. Tien T. Nguyen et al5 evaluated performance of tag based recommendation algorithms for neighborhoods defined by tags. They used the movielens dataset which has both explicit user demographic information and user defined tags. Uwe Malinowski et al6 defined a taxonomy for user

modelling in terms of models, knowledge acquisition and interface integration. For our work, models can be content based, collaborative, hybrid, memory based or based on machine learning. We have adopted content based approach while targeting individual users instead of stereotypes.

Laura Dietz et al7 showed that a visible social network is a combination of many invisible networks in various dimensions and by delayering a social network using user content, one can outperform methods such as LDA (Latent Dirichlet Allocation) in predicting similarity. In our work, we also worked with delayered social network i.e. only considered the aspect of taste with respect to food. Hugo Liu et al8 presented an implementation of a multidimensional taste fabric by mining the social network texts of 100000 users. The user profile in this work was a bag of phrases of various interest categories (book, music etc.) which is then normalized using extensive handmade ontologies (interest and identity descriptors) for all the interest domains. Machine learning techniques are subsequently used to do a correlation analysis and semantic relatedness is determined across such descriptors. Then a composite multi-dimensional taste fabric is arrived at and evaluated with conventional recommendation algorithms. However such an effort is hard to replicate and maintain for every online domain that is constantly evolving. Eva Jaho et al9 presented ISCoDe framework for detecting community based on interest similarity. First the interest similarity is assessed from the user defined tags and is used to create a graph with the similarity measure as the edge weights. Then standard community detection algorithms are applied to detect communities. Magnini et al10 showed that semantic analysis of websites frequented by users can be used to successfully build an accurate recommender system.

In our work, we did content based user modelling by capturing taste. It also aims at building a taste fabric but without hand-crafting an ontology. Our work is also focused on only one dimension or layer of the social network i.e. chosen domain of food. The method adopted is not a collaborative method of user modelling as the target domain for our work is not heavy with friendship kind of relations and so the explicit social network linkages cannot be taken advantage of. It addresses the scenario where neither the taste based tags nor the taste based communities are explicitly defined and used a novel approach to discover tags from review texts. We have also taken into consideration the importance of semantic analysis of the review texts while modelling the users. Our work has not addressed the temporal aspect of the user model itself evolving with time. Some researchers have come up with latent factorization based models for evolution of user expertise. Our work takes a snapshot of the past text reviews and models the users.

2.2. Key Phrase extraction

We have reviewed two existing bodies of work while building our taste framework. The first is the existing researches for key phrase extraction and second is the existing researches on automatic annotation. The researches for key phrase extraction were reviewed as finding the key topics from unstructured text is a necessary first step for finding out the main topics in the customer reviews. The automatic annotation methods were reviewed to examine how the identified key phrases can be further compressed into a set of tags to form an implicit taste vector.

For key topic extraction from unstructured text, the common approach irrespective of the domains involves finding the words or phrases representing different topics and sorting them using some logic to find the most important ones. The first step of candidate generation is based on heuristics i.e. typically a syntactic metadata11 will be applied to the text to find out parts of speech patterns of interest. For example, an adverb-adjective-noun pattern is expected to contain meaningful opinions. Similarly, using sentiment analysis techniques12, the subjective texts can be extracted to yield texts of positive and negative polarity and thereby containing useful opinions. Comparatively, objective texts with neutral polarity may not be useful. This first step of candidate generation will yield many candidates for key phrases. The next step is more challenging as one needs to make a selection of key phrases that are coherent13 and take care of same meaning possibly conveyed by different sets of words. The domain also has a role to play. For scientific domain, the listed keywords and the abstracts help in finding the key phrases14. In web search, the server search logs may have important clues. In product domains such as personal computer, phone etc. there are available training datasets which open the possibility of supervised key phrase detection15. However in domain such as restaurant business, supervised approach is not feasible and one has to fall back on unsupervised method.

The unsupervised approach itself can be divided into several categories i.e. statistical, dimensionality reduction, clustering, graph theoretical and mixed strategies. In statistical approach, the word frequency is the basic building block and statistical co-occurrence amongst words can be used to build the next level of information16. External resource such as Wikipedia also can be used to find out the relative importance of candidate key phrases. The statistical approaches yield somewhat acceptable result but it is hard to proxy the semantic aspect of the text. In clustering approach, clustering on information units is attempted to find the exemplar terms17 and these terms can then be used to find key phrases. Sometimes, external resource such as Wikipedia information can be used to cluster these information units. But quality of clustering is always a hard problem to solve. The dimensionality reduction techniques may have linear algebra or probabilistic groundings. The linear algebra based methods such as Latent Semantic Indexing (LSA), Non negative matrix factorization (NMF) and probabilistic methods such as LDA (Latent Dirichlet Allocation)18 yield key topics as set of words but it takes lot of trial and error to come up with topics that represent a theme. The most famous of the graph theoretical approaches is Textrank19. The basic Textrank algorithm was for text summarization where the sentences are the nodes of the graph and the undirected edges represent the similarity between the sentences represented as vector of words. For keywords, the sentences can be replaced by words and edge-weights can be replaced by co-occurrence metric. There are also mixed strategies built around Textrank. Topic rank20 is such a strategy using Textrank with LDA. The Collabrank21 clusters the documents first and key phrases are picked up in each cluster using Textrank.

2.3. Automatic annotation

The second part of our proposed framework attempts to represent semantic content of identified key phrases by suitable annotations. Tags, explicitly or implicitly defined, can have a big role to play in electronic commerce. Olena Medelyan et al22, proved that automatically extracted keywords perform equally well compared to human annotated tags. An annotation is applied to a web document for various purposes i.e. classifying a document to a category or sub-category, identifying sentiment polarity, identifying the skill level required from the reader, identifying the overall concept or just to indicate a representative summary. Much of the existing research on automatic tag recommendation has heavy reliance on domain specific ontology23 i.e. target documents are parsed and then compared with the ontology schema for the target domain to recommend a tag. In an ontology hierarchy, parent child hierarchy is used to recommend a tag that represents a parent and to eliminate duplication, the common parent of two competing child tags are chosen from the ontology tree. Most of the ontology is hand-coded and sometimes it is acquired from internet based sources. In online review domain, coming up with annotation that represents a customer's taste is more of a summarization problem while retaining some of the details. While this ontology based generalization approach works well for annotating a document, it does not work well when annotation is used to represent a taste which is often very specific.

3. Formulating taste as a vector

3.1. Explicit taste

We can model explicit taste as a bag of words where each word is a self-anointed tag and an attribute of the business. In the restaurant dataset we worked with, these tags are available as category and attribute tags. In our dataset, the category signifies the type of restaurant whereas the attribute tags are various facilities offered. When a customer visits many restaurants, these attribute/category tags show an explicit pattern of his preferences. For example, in a movie review portal, movie classification is a kind of explicit tag and will constitute an explicit part of the taste. An explicit taste that a business offers is explicitly declared and need not change from one review record to the next. On the contrary, the explicit taste tags for a customer has to be ascertained from his numerous review records or footprints. For example, a customer may visit many restaurants. All these restaurants may not be identical in terms of category and attributes but may have few common tags. Examples of such common tags for a hypothetical customer are category tag "nightlife" and a common attribute tag "Happy hour". These common tags will show up as customer's preferences. This matrix of "explicit taste" will be a sparse matrix.

Taste Framework

Fig 1: Proposed taste framework

3.2. Implicit taste

In the proposed framework, we have used various available techniques as resources in our toolbox and evaluated as well as extended existing researches to come up with the recommended methods. To begin with, we have used classification based sentiment analysis technique to extract the text that is clearly subjective. After that, we have used unsupervised approach (graph theoretical method) to find the key topics and followed the original Textrank algorithm with some modifications. We started with extracting key words defined by POS (parts of speech) patterns such as "adjective and noun" and built a graph of candidate keywords. First, we replaced the co -occurrence based similarity metric in Textrank by the concept of semantic distance using WordNet24. As a second alternative, we derived the similarity metric from a Word2Vec25 model built based on the distributional properties of words in our corpus. We then applied PageRank26 on this, sorted the keywords on calculated PageRank and chose to retain only certain top portion (say 20%) of this list. Finally, we combined key words into key phrases by checking adjacency of the chosen keywords in the original text.

Since the selected key phrases were many, we compressed it further by automatic annotation. For this next step, we treated this as a summarization problem instead of ontology based categorization. We framed the problem of coming up with a suitable annotation as a second level of text mining problem where we need to reduce the dimensionality of the many selected key phrases. As an individual's taste is too complex to be described by a single annotation, we opted for multiple annotations or tags to suitably represent an individual's implicit taste. We have tried out three approaches with a suitable size n of the implicit taste vector. Firstly, we have done TF IDF (term frequency, inverse document frequency) based top n keyword extraction. Secondly, we have used non-negative matrix factorization (dimensionality reduction technique) to extract top n dimensions and top 3 keywords in each dimension. We then extracted n non-duplicate words from the basket of words representing these dimensions. Thirdly, we have done K means clustering with the basket of words representing the extracted key phrases and picked up the central word for each of these n clusters, eliminating any duplicate. For us, the clustering strategy yielded best results. The obvious limitation of the whole framework is the fact that it works well for customers who have done large number of text reviews. It is not possible to come up with a vector of n words if number of reviews itself is less than n. This is the familiar cold start challenge in recommendation system. The usual strategy of recommending most popular products or falling back on recommendation using existing social network information can be adopted here as well.

4. Deriving value from identified taste

As no benchmark is readily available for this domain, we decided to work with a real life dataset. For this purpose, we have chosen Yelp challenge dataset as it provides rich information about customers as well as businesses with lot of text reviews. For investigating the utility of the proposed taste framework, we used a relatively small restaurant dataset from city of Edinburgh from Scotland, UK. We extracted the restaurant dataset for Edinburgh subject to the condition that each restaurant must have at least 50 reviews and each customer must have at least 20 reviews to facilitate text mining. Total number of eligible businesses in this dataset was 47 and total number of reviews were 1077. There were 2353 customers mentioned both as reviewers as well as friends of reviewers. Some customers had many friends but most of the friends were not participating in reviews. However with the condition of at least 20 text reviews, the number of eligible customers came down drastically. We decided to derive a 5 word implicit taste vector to keep the computational load minimum and in our pre-processing, mapped the anonymized ids to integer ids as anonymized ids are not necessarily printable characters. Though the dataset is small, it serves the purpose of proving the point while not demanding a large development hardware and CPU time.

Fig 4: social network based on explicit taste vectors

Fig 5: social network based on implicit taste vectors

We have done 4 experiments with the reduced dataset. In each case, the customers have been analyzed and adjacency matrix is built to detect possible communities. These experiments were designed to test the effectiveness of the taste framework towards community detection and possibility of a recommendation mechanism based on taste similarity. In the first case, the community is formed based on declared friendship. This is already available to the portals and the objective of our work was to deliver additional values. The second case was by examining common restaurants visited in a way similar to collaborative filtering. The third case was based on explicit tastes of the customers and final case was based on implicit taste vector derived from customer reviews. The social network graphs are drawn as shown in Figure 2 to Figure 5 and it becomes clear from the graphs that distinct social communities are possible based on taste. Then, as an example, the top few similar customers for a few given customer are found out based on calculated similarity.

Table 1. Examples of top few neighbors in a customer's community

Customer id All declared friends Top few similar customers as per explicit taste Top few similar customers as per implicit taste

1 846, 709,238, 624, 756, 181, 633, 827, 862, 895 1076,269,394,385 1,0,26,29

2 65,17,20,238,543,306,181,443,188,575,449, 706.324,846,862,827,615,877,1006,624, 498,756, 405 885, 915,926,927 2,27,10,16

3 65,405, 17, 20, 238, 543, 306, 827,188, 575, 449, 706, 324, 709, 846, 862, 443, 615,877, 1006, 624, 498, 756, 633, 895 1076,269,394,385 10,16,5,2

From the social network graphs constructed above using the adjacency matrix based on calculated similarity, it is clear that latent communities do exist beyond the obvious (the declared friend relations). More importantly, the explicit and implicit taste vectors (even for this small datasets) show hidden communities and create new possibilities for the customer facing applications. The Table 1 above lists examples of top few most similar customers for few customer ids. Without this new way of finding similarity between tastes, the prevalent recommendation systems in the online review rating portals have limited capabilities. With explicit and implicit taste vector based similarity analysis, these results prove that community formation can be done and product recommendations can be made even when the customers have not explicitly declared any friend relations and the products have no clearly defined features. With a larger development hardware, a large dataset can be processed with more CPU time and similar results on a much larger scale can be demonstrated very easily.

5. Conclusion

In this paper, we have done user modelling by putting together a framework that formulates the explicit and implicit components of taste from online reviews. This is called a framework as different options can be exercised at different stages such as key phrase extraction and annotation. We did this by evaluating existing methods, using them as toolbox and making extensions wherever necessary. This approach attempts to model users directly instead of the indirect approach taken by the collaborative filtering in recommendation systems. It also attempts to address the gap due to lack of content based recommendation systems for products and services lacking clearly defined feature lists. However the benefits will not be limited to recommendation application alone. For example, as taste vectors get identified across the community population, marketing can be more effectively focused on customer segments and sub-segments. More customer facing applications may then take advantage of this framework.

As an immediate next step, we plan to replicate this work on a much larger dataset using a high end development hardware to demonstrate essentially the same result on much larger scale. Also suitable academic dataset needs to be identified or prepared for such an effort. We would then like to formalize the work by building a hybrid recommendation system using this concept of taste, measuring the performance and by publishing the comparison with baseline recommendation system metrics.

References

1. J.Bobadilla, F.Ortega, A.Hernando, A.Gutierrez. Recommender System Survey. Knowledge-Based Systems 46, 2013. p. 109-132.

2. Gediminas Adomavicius and AlexanderTuzhilin. Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.17, NO.6. 2005.p.734-749

3. Sahebi, Shaghayegh and Cohen, Williams. Community-Based Recommendations: a solution to the Cold Start Problem. Workshop on Recommender Systems and Social Web (RSWEB), held in conjunction with ACM RECSYS'11, Chicago. 2011.

4. Konstas, I., Stathopoulos, V., and Jose, J. M. On social networks and collaborative recommendation. Proceedings oof the 32nd international ACM SIGIR conference on Research and development in information retrieval, 2009.p. 195-202

5. Tien T. Nguyen and John Riedl. Predicting Users' Preference from Tag Relevance. UMAP2013, Springer-Verlag Berlin Heidelberg LNCS 7899. 2013. p.274-280.

6. Uwe Malinowski, Thomas Kuhme, H. D. and Schneider-Hufschimdt, M. A taxonomy of adaptive user interfaces. HCI'92 Proceedings of the conference on People and computers VII. 1993 .p. 391-414.

7. Laura Dietz, Ben Gamari, John Guiver, Edward Snelson, Ralf Herbrich. De-Layering Social Networks by Shared Tastes of Friendships. Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media. 2012.

8. Hugo Liu, Pattie Maes, Glorianna Davenport. Unraveling the Taste Fabric OF Social Networks. Social Networking Communities and E-Dating Services: Concepts and Implications: IGI Global. 2009.

9. Eva Jaho, Merkouris Karaliopoulos and Ioannis Stavrakakis. ISCoDe: A framework for interest similarity based community detection in social networks. Computer Communications Workshops (INFOCOM WKSHOPS), IEEE CONFERENCE.2011.

10. Magnini, B. and Strapparava, C. Improving user modelling with content-based techniques. Bauer, M., Gmytrasiewicz, P., and Vassileva, J., editors, User Modeling 2001, volume 2109 of Lecture Notes in Computer Science. Springer Berlin/Heidelberg. 2001.p. 74-83.

11. Ken Barker and Nadia Cornacchia. Using noun phrase heads to extract document keyphrases. Proceedings oof the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence. 2000. p. 40-52.

12. Bo Pang and Lillian Lee. Opinion mining and sentiment analysis. Journal Foundations and Trends in Information System.2008.

13. Peter Turney. Coherent keyphrase extraction via web mining. Proceedings of the 18th International Joint Conference on Artificial Intelligence. 2003. p. 434-439.

14. Su Nam Kim, Olena Medelyan, Min-Yen Kan, and Timothy Baldwin. SemEval-2010 Task 5: Automatic keyphrase extraction from scientific articles. Proceedings of the 5th International Workshop on Semantic Evaluation. 2010b. p. 21-26.

15. Ian H. Witten, Gordon W. Paynter, Eibe Frank, Carl Gutwin, and Craig G. Nevill-Manning. KEA: Practical automatic keyphrase extraction. In: Proceedings of the 4th ACM Conference on Digital Libraries, 1999. p. 254-255.

16. Yutaka Matsuo and Mitsuru Ishizuka. Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelligence Tools. 2004. p. 13.

17. Zhiyuan Liu, Peng Li, Yabin Zheng, and Maosong Sun. Clustering to find exemplar terms for key phrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009b. p. 257-266.

18. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet al.location. Journal of Machine Learning Research. 2003. 3: p.993-1022.

19. Rada Mihalcea and Paul Tarau. Textrank: Bringing order into texts. Proceedings oof the 2004 Conference on Empirical Methods in Natural Language Processing. 2004. p. 404-411.

20. Adrien Bougouin, Florian Boudin, and B eatrice Daille. Topicrank: Graph-based topic ranking for keyphrase extraction. Proceedings of the 6th International Joint Conference on Natural Language Processing. 2013. p. 543-551.

21. Xiaojun Wan and Jianguo Xiao. CollabRank: Towards a collaborative approach to single-document keyphrase extraction. Proceedings of the 22nd International conference on Computational Linguistics .2008a. p. 969-976.

22. Olena Medelyan, Eibe Frank, and Ian H. Witten. Human-competitive tagging using automatic keyphrase extraction. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009. p. 1318-1327.

23. Panos Alexopoulos, John Pavlopoulos, Manolis Wallace and Konstantinos Kafentzis. Exploiting ontological relations for automatic semantic tag recommendations. I — SEMANTICS 7th Int. Conf. on Semantic Systems. 2011.

24. Wordnet, a lexical database for English, https://wordnet.princeton.edu/

25. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado and Jeffrey Dean of Google Inc. Distributed representation of words and phrases and their compositionality. Advances of Neural Information Processing Systems 26 (NIPS). 2013

26. Sergey Brin and Lawrence Page. The anatomy of a large-scale hyper textual Web search engine. Computer Networks. 1998. 30 (1-7): p.107-117