Scholarly article on topic 'Identifying Collaborative Innovation Networks'

Identifying Collaborative Innovation Networks Academic research paper on "Computer and information sciences"

Share paper
{networks / organization / collaboration / interdepartmental / "semantic networks"}

Abstract of research paper on Computer and information sciences, author of scientific article — James A. Danowski

Abstract Collaborative Innovation Networks (COINS) are typically defined using individuals as nodes. Departments in organizations or higher order units can, however, be considered as forming COINs of interest. WORDij 3.0 software enables locating COINS at various levels of analysis including organizational departments. This paper reports on an analysis of interdepartmental collaboration networks of a college based on co-occurrence of department names in news stories about the college. It demonstrates the face validity and utility of using WORDij 3.0 to identify COINS at levels of analysis higher than the individual.

Academic research paper on topic "Identifying Collaborative Innovation Networks"

Available online at


Procedia Social and Behavioral Sciences 2 (2010) 6304-6417

COINs2009: Collaborative Innovation Networks Conference

Identifying Collaborative Innovation Networks At the Inter-Departmental Level

James A. Danowski, Ph.D.*

Department of Communication, University of Illinois at Chicago, Chicago IL 60607 USA Elsevier use only: Received date here; revised date here; accepted date here


Collaborative Innovation Networks (COINS) are typically defined using individuals as nodes. Departments in organizations or higher order units can, however, be considered as forming COINs of interest. WORDij 3.0 software enables locating COINS at various levels of analysis including organizational departments. This paper reports on an analysis of interdepartmental collaboration networks of a college based on co-occurrence of department names in news stories about the college. It demonstrates the face validity and utility of using WORDij 3.0 to identify COINS at levels of analysis higher than the individual.

Keywords: networks, organization, collaboration, interdepartmental, semantic networks

1. Introduction

There is no need to go into detail on how important social network analysis has become across a wide landscape of disciplines. Moreover, in his sweeping assessment of research, the National Science Foundations' co-director of Human-Centered Computing, Bainbridge (2004) states: "Understanding semantic approaches to cultural systems would enable the engineering of culture..." (p. 174). The research presented in this paper illustrates one example of Bainbridge's profound semantic revolution, the study of social actors using semantic-network analysis of full texts of large volumes of news stories obtained from the Web.

* Corresponding author. Tel.: +0-312-996-3187; fax: +0-312-996-2125. E-mail address:

1877-0428 © 2010 Published by Elsevier Ltd. doi:10.1016/j.sbspro.2010.04.050

Automatic mapping of social networks of actors is of interest in the fields of communication, management, and other social sciences. Here we demonstrate new procedures that can mine high volumes of text using the WORDij 3.0 software (Danowski, 2009), which can map names of organizations, departments, persons, places, and words in the same network. Networks of departments in an organization are our focus in this paper.

The techniques demonstrated in this research allow for several kinds of uses: 1) descriptive analysis of social networks, 2) finding how social network indices are associated, 3) analyzing network indices over time-series, 4) associating network characteristics with other data, 5) conducting time series analysis of network data in relation to other kinds of data, 6) comparing different networks for features that are similar and/or different, 7) linking social networks to semantic networks and mapping them in the same representations.

Because the current software grew out of semantic analysis based on word co-occurrences mined from large bodies of text, it not only maps word networks, but networks among social actors. This is because names are a type of word. By using the opposite of a stop list or drop list of words, an include list, the current software can ignore all words except for the names of the target social actors. Or, if desired, it can map both the actors and other words connected with them in the same network.

One can use a list of names either from an a priori source, here we use the names of departments in an institution of higher education, or from initial exploratory analysis of texts. In the latter method, WORDij 3.0 can be set to output all proper nouns, including person, place, and organizational names and those of objects and formal concepts. This is automatically prepared as a string replacement list so that multiple proper nouns are converted to a single term, or a unigram. The user can edit these to keep only the names of entities of interest. Then the WordLink routine in WORDij 3.0, with the string replacement and the include lists will produce the social network among those names within the body of text input into it.

We analyzed the networks among departments in a college of higher education occurring in news stories for the institution in its local market and larger media market for each of four years. Our goal is to demonstrate the basic techniques of doing a type of automatic social network analysis.

There are four key points this paper makes: 1) collaborative innovation often takes place in organizational settings with resources and constraints shaped by the system; 2) The individual level of analysis used in most COINS research ignores the departmental level in organizations, where departments are considered by participants as the collaborating units; 3) The betweenness centrality measures used in most COINS research to identify innovative groups of individuals are not appropriate given the assumptions that such collaborations involve communication processes in which messages need not always flow through the shortest path, can be distributed through more than one path either synchronously or asynchronously, and may be increased in frequency of communication by the social actors. Borgatti (2005)points out that betweenness centrality is inconsistent with these assumptions because it is based on finding one shortest path linking each pair of actors, treats links as present or absent rather than having valued strengths, and assumes one message is disseminated down each shortest path. Instead, flow betweeenness is the measure that is consistent with the theoretical and practical assumptions about communication. This is the measure for identifying COINs used in this research; and 4) How media represent collaborations can be important both for mapping COINs and for observing how these are portrayed in media, thus possibly indirectly influencing innovation through audience loops back to the COINs that may communicate perceptions that influence both the participants and their social observers, communicate changes in resources, or impose constraints on the innovation.

2. Methods

A list of the department names in the college were obtained from college personnel. The approach to assembling corpora to mine was to search Lexis/Nexis Academic using Power Search to identify stories about the college from 2005 to 2008. Every story containing the college's full name or acronym name was obtained, so we performed a census of the relevant text universe. We then aggregated all of these files into one text file and used the TimeSlice utility in WORDij 3.0 to segment the file into four annual files Each of the four text files were automatically analyzed using WORDij 3.0 to measure the co-occurrence of the department names.

WORDij was originally designed to analyze large numbers of co-occurring words to create semantic networks. Nevertheless, social actors' names are indeed words and mining for their co-occurrence is no different. WORDij 3.0 not only has a stop-word list or droplets, it also has its opposite, an include list that will map the network only among words on it. Additionally, some features to aid in multi-node type analysis including people, organizations, places, and formal concepts and objects is enumeration of proper nouns and automatic creation of include lists from them. For this paper, using WORDij 3.0's string replacement and include list functions, all aliases we created for each cabinet member's name were converted to a single string and then proximity-based co-occurrences were computed.

2.1 Automatic Link Coding with Proximities not with "Bag of Words"

Proximity co-occurrence indexing (Danowski, 1982, 1993a, 1993b, 2009; Diesner & Carley, 2004) avoids the problems of the simplistic "bag of words" approaches common from Information Science and Information Retrieval. While word bags are useful for document retrieval they blur social meaning by ignoring the relationships of social units within the texts, whether these units are words, people, or other entities. It is more analytically precise, however, to use a proximity criterion in defining relationships among entities network analyzed. We used a three-word window, operating on the text file after all words except the names of departments were automatically removed by the use of the WordLink include list of department names.

2.2 Creating the String Replacement and Include Lists

The first step in preparing the list of names for the network analysis in WORDij 3.0 is to create a string replacement list, an advanced option. This converts aliases for each name into a unigram. Table 1 shows an example. Table 2 is the include file used.

Table 1. Examples of String Replacement Files

Department of Advertising and Design->Advert_Design Dept. of Advertising and Design->Advert_Design Dept of Advertising and Design->Advert_Design Advertising and Design Department->Advert_Design Advertising and Design Dept.->Advert_Design Advertising and Design Dept->Advert_Design Advertising and Design->Advert_Design

Department of Accessory Design->Accessory Dept. of Accessory Design->Accessory Dept of Accessory Design->Accessory Accessory Design->Accessory Accessory Design Department->Accessory Accessory Design Dept.->Accessory Accessory Design Dept->Accessory

B WORDij Software

; WordLink : OptiComm QAPNet Z Utilities Conversions Utilities

Source Text File Source File:

( Browse... J text file in UTF-8 format.

Drop List File Drop List File:

[ Browse... file with a list of words that will be dropped.

Drop words appearing less often than 3 C Drop words /

Drop pairs appearing less often than 3 ^ | pairs appearing —

less often than:

words / pairs

Use constant v linkage strength method appearing less

Window size for extracting word pairs 3 0 often will be not included in the

output files. V

[ Advanced Options | Quit ] [ Analyze Now ]

Figure 1. WORDij 3.0 Interface

Select File

| [ Browse... |

Character Filter File

[ Browse... ]

Include List File

[ Browse...

String Replace List File

| Browse... ~|

0 Output word and pair statistics

□ Output clean text [ ? ]

0 Print BOM to mark UTF-8 output files

1 | Ignore wordpair order

0 Remove words containing numbers pi Perform analysis at sentence level

□ Filter HTML tags

0 Remove punctuation inside words [~| but keep "-" and"/"

□ but replace"-" and"/" with space

1 | Use Porter stemming algorithm

□ Use Chinese filter

□ Replace ending 's with is

□ Replace ending 'm with am 're with are 'd with would

_'II ...a-k mill M miKk IIH-U lU ..lil-K Ml-____

| Back

Figure 2. WORDij 3.0 Interface - Advanced Options

Table 2. Include File with Department Names






























Vis_Effects Foundation

2.3 Post-Processing of Link Data for Centrality Measures

The WORDij 3.0 program has the option of producing a network file in the .net Pajek (Batagelj & Mrvar, 1998). format. This is one of the import file types that UCINET (Borgatti, Everett, & Freeman, 2002) accepts and converts to its system files. We chose UCINET because it is widely accepted in the social network analysis community and we wished to use common validated centrality indices to profile the structures of the cabinets. Given the status of UCINET and the ease of output importing we felt no need to incorporate centrality measures into WORDij.

Although betweenness centrality (Freeman, 1979) is the most often used metric, it is limiting given the quality of data that we have. Betweenness is computed on dichotomized, linked/not-linked relationships. Our data are continuous and highly varying. Among a number of different centrality measures, eigenvector centrality (Bonacich, 2007) is useful when the links have scalar values. This eliminates the need for binarizing the network, which discards valuable variance on link strengths. Additionally, if one is using news stories, email, or other forms of communication as input texts, then it is more appropriate to use Flow Betweenness than Betweenness Centrality. Borgatti (2005) discusses how most uses of betweenness centrality are inappropriate.

2.4 Combining Visualization with Statistical Network Centrality of Actors

A fundamental tenet of data analysis is to first visualize it. WORDij 3.0 has VISij for creating static or time-series movies of changes in network composition and structure. Our interest in this paper is in profiling the aggregate networks of departments, year by year, therefore a series of static representation results. While VISij has time-series visualizations that NetDraw (Borgatti, 2002) does not have, NetDraw has more options for rendering static networks such as having larger circles for more central nodes. We used eigenvector centrality to visually render the nodes' network position, because the program does not compute flow betweenness. For link strength we used the maximum available range of thickness of links, from 0 to 12. Our larger array of strengths was converted to this scale.

Although visualizing data is essential to help place further statistical analysis in context, it has its limitations, beyond the lack of rules for analysts to use in assessing network visualizations. Spring-embedded layout procedures present the analyst with a different vantage point on the network each time it is run on the same data, which can result in differing interpretations. Needless to say, using statistical information following visual inspection of networks affords the analyst with the best of each mode.

While visualization may have sufficient face validity to support action with respect to the network when there are small numbers of social actors, it becomes increasingly less useful as the number of nodes and links increases above 30. In addition, how intensively and extensively these nodes are linked can add to visual information overload, rendering interpretation of networks of questionable validity. It is difficult to make effective interpretations when the network looks like a cross-cultural accident of a big bowl of spaghetti with jambalaya on top, as is usually the case with visual output from Crawdad software (Corman, Kuhn, McPhee, & Dooley, 2002) rather than like a plate of sushi.

3. Results

There were 1946 full text documents from Lexis/Nexis Academic for 2005-2008 for the college.

Table 3. Interdepartmental Normalized Flow Betweenness by Year*


Mean Std Dev

2.23 3.99

Network Centralization Index = 10.83%


fashion animation interior filmtv architec performing painting print sequential

12.62 12.54 11.89 5.05 3.62 3.62 2.56 1.81 0.92 0.58 0.40

photog illus


Mean 1.41

Std Dev 3.10

Network Centralization Index 10.308%


painting 11.22

teaching 9.34

performing 4.86

photog 2.28

animation 0.85

writing 0.78

foundation 0.13

fashion 0.06

jewelry 0.06

Mean Std Dev


3.78 5.67

Network Centralization Index 14.04%


painting 17.25

fashion 15.60

animation 14.42

performing 12.40

interior 11.72

architec 7.32

print 6.97

filmtv 3.35

jewelry 1.69

urban 1.38

writing 1.28

accessory 0.95

teaching 0.18


Mean 4.87

Std Dev 7.89

Network Centralization 23.91%


photog 27 . .82

painting 26, 39

interior 17 . .50

performing 10 . 67

architec 10 25

teaching 9, 83

fashion 4 . 16

sequential 3 . 56

animation 3 . 05

filmtv 2 . 89

jewelry 2 . 49

foundation 1. 74

urban 1. 25

sculpture 0 . 16

Non-zero departments shown

4. Discussion

This study focused on developing a new method for identifying the social network structures that emerge in analyzing co-occurrences of social actors in news stories using automated text mining. This was accomplished

The method used and described in this paper appears to have face validity and is worthy of further refinement. This observation is supported by the characteristics of the presidential cabinets studied and the extent to which the network structures mapped appear to have correspondence to how these administrations were structured, operated, and viewed by the media. Further investigation of correspondences to more systematically evaluate the validity of the method of automatic social actor network measurement appears justified and likely to be fruitful.

This research has demonstrated the feasibility of automatic social actor network analysis based on data mining of large volumes of news stories. Whenever one has a list of actors, be they individuals, departments, organizations, nations, or whatever the basic social unit to be analyzed, and one wants to automatically map networks from news stories, this can easily be done with WORDij 3.0. Such methods are founded on the assumption that the representation of social actors in news stories or other web documents is of interest.

Future research of potential interest would be simultaneous mapping of concepts and objects along with the social actors, representing all of these in the same network. In this way, one could observe large numbers of social networks automatically viewing the word-networks with which the social actors are associated and with what objects and geographic locations they are linked. Some scholars (Latour, 2005) and some practical network analysts may prefer to treat as nodes in the same network the social units, the words they use or are used to describe them, place names, and other proper nouns. WORDij 3.0 also enables this kind of "actor network theory" mixture of nodes in the same networks identified. Further research might find exploration of these features valuable.

Figure 3. 2005 Interdepartmental Network

"interior Vcuptjjre

Figure 4. 2006 Interdepartmental Network

Figure 5. 2007 Interdepartmental Network

Figure 6. 2008 Interdepartmental Network

Figure 7. Aggregate Interdepartmental Collaboration Network: 2005-2008.

5. References

Bainbridge, W. S. (2004). The evolution of semantic systems. Annals of the New York Academy of Sciences, 1013, 150-177.

Batagelj, V., & Mrvar, A. (1998). Pajek - Program for large network analysis. Connections, 21(2): 47-57.

Bonacich, P. (2007). Some unique properties of eigenvector centrality Social Networks 29(4): 555-564.

Borgatti, S.P. (2002). NetDraw: Graph visualization software [computer program]. Harvard: Analytic Technologies

Borgatti, S. P. (2005). Centrality and network flow. Social Networks, 27, 55-71.

Borgatti, S.P., Everett, M.G. and Freeman, L.C. (2002). UCINet for Windows: Software for social network analysis [computer program]. Harvard, MA: Analytic Technologies.

Corman, S. R., Kuhn, T., McPhee, R. D., & Dooley, K. J. (2002). Studying complex discursive systems: Centering Resonance Analysis of communication. Human Communication Research, 28(2), 157-206.

Danowski, J. A. (1982). A network-based content analysis methodology for computer-mediated communication: An illustration with a computer bulletin board, in M. Burgoon (Ed.), Communication Yearbook 5 (pp. 904-925). New Brunswick, NJ: Transaction Books.

Danowski, J. A. (1993a). WORDij 3.0: A word-pair approach to information retrieval. Proceedings of the DARPA/NIST TREC Conference, Washington, DC.

Danowski, J. A. (1993b). Network analysis of message content. In R.E. Rice, W. Richards, & G. Barnett (Eds.), Progress in communication sciences XII (pp. 197-222). Norwood, NJ: Ablex.

Danowski, J.A. (2009). Inferences from word networks in messages. In Krippendorff, K. & Bock, M.A (Eds.) The content analysis reader (pp. 421-429) . Sage Publicaitons.

Danowski, J.A. (2009). WORDij 3.0 [computer program]. Chicago: University of Illinois.

Diesner, J., & Carley, K. M. (2004). AutoMap 1.2: Extract, analyze, represent, and compare mental models from texts. Carnegie-Mellon University.

Freeman, L.C. (1979). Centrality in social networks: Conceptual clarification. Social Networks, 1, 215-239.

Latour, B. (2005). Reassembling the social: An introduction to Actor-Network-Theory. London: Oxford University Press.