What is Data? What is Information?

Florian Allwein from London School of Economics will present research into What is Data? What is Information? Read the summary below.

As the field of Information Systems (IS) research is increasingly concerned with data, a clear definition of this term, especially in distinction from information, would be desirable. As McKinney & Yoos (2010) show, “’Information’ is poorly defined in the Information Systems research literature, and is almost always unspecified, a reflexive, all-purpose but indiscriminant solution to an unbounded variety of problems.” (p. 329). This paper builds on this work and looks at how the term ‘data’ has been used in recent IS research. It finds that most authors take an implicit view of data as unprocessed information. Based on an outside definition of data as “facts of the world”, it relates the concepts to the ontology of critical realism, arguing that ‘data’ should refer to items in the realm of the actual, whereas ‘information’ should refer to items in the realm of the empirical. IS can thus be seen as efforts to capture the facts of the world from the realm of the actual and store them (in the realm of the empirical) in order to make them accessible for analysis. The paper ends by outlining how this can shape future research.

Exploring generative evolution in open source software ecosystems

Alexander Eck from University of St Gallen will present research into Exploring generative evolution in open source software ecosystems. Read the summary below.

Generative systems, or socio-technical systems that change through contributions from broad and varied audiences without central oversight and planning, arguably drive innovation in the digital economy. Despite growing scholarly interest, most work on the generativity phenomenon – what it is and how it plays out – has remained conceptual to date. We present an on-going research project that aims to detect and trace generative evolution across a broad range of distributed software ecosystems in the open source community. The empirical research is based on digital trace data on over 140.000 software artifacts, technical dependencies between them, and various design activities carried out over a four-year period. We employ computational approaches to make sense of the large amount of available data. Once completed, this study is expected to greatly advance generativity research from a data-driven, explorative point of view.




Patient satisfaction from NHS services

Radoslaw Kowalski from University College London will present research into Patient satisfaction from NHS services. Read the summary below.

Currently, patient satisfaction from NHS services is estimated with measures that may hardly relate to self-reported needs of patients or that use old data. Nonetheless, healthcare institutions depend on funding that is decided in part with the help of the ill-calculated patient satisfaction. As a result, patients’ actual best interest may be in conflict with the best interest of the evaluated NHS health organisations. Patients may receive suboptimal health services and lose trust in professionalism and intentions of doctors and health organisations that try to stick to performance targets. The reputation of medical professions may also drop and prompt health professionals to seek work elsewhere. A new organisational performance measurement tool, the purpose of this study, can break the vicious cycle of distrust and improve the quality of healthcare by more accurately measuring patient satisfaction. More accurate measurements of patient satisfaction could enable cost savings by reducing unwanted services and improve staff retention. The study involves processing online reviews of NHS services in England with topic modelling, sentiment analysis and importance score calculation for each topic. Calculations of importance scores as well as sentiments for each topic are aimed at overcoming a common misconception that topic importance is equivalent to its quantitative presence in reviews. Organisational decision-makers who assume greater importance for more commonly mentioned topics are actually receiving a skewed insight into customer preferences. The outputs of the proposed method are compared with insights into patient satisfaction from existing patient satisfaction indicators to test new method’s validity and reliability.

Analysing and Exploring Drifts in Innovation Streams within Open Source

André A. Gomes de Souza from Manchester Business School will present research into Analysing and Exploring Drifts in Innovation Streams within Open Source. Read the summary below.

This work explores empirically the Apache Hadoop in the context of outbound open innovation (OI) in SMEs through the lens of innovation streams. The Apache Hadoop is a open source (F/OSS) library of codes for distributed computer processing, and it is the industry standard for big data analysis. Organisations have radically changed the way they store, manipulate, and create value from information. They obtain data from different sources and in diverse formats. These data have became significant corporate assets and the foundation of new business opportunities. New concepts of value production were brought to light by the notion of OI, including F/OSS. F/OSS has moved away from the initial model of dispersed and decentralised governance and control. The established peer-based configuration is now being replaced by more advanced business models, leading to the growth of what Brian Fitzgerald has formulated as OSS 2.0. Outbound OI in F/OSS SMEs’ technology spin-offs relates to the innovation streams paradigm in terms of discontinuous innovation. Innovation streams are a set of innovations that build upon the current products and service of an organisation, extend that organisation’s technical direction, and/or help it diversify into different markets. This study draws on the interpretative case study tradition and its findings have implications for both organisations offering big data products and service and OI/F/OSS researchers. Many questions regarding this relationship still remain, and this work addresses some of these unanswered issues.

Energy consumption patterns and what they can tell us

Anastasia Ushakova from University College London will present research into Energy consumption patterns and what they can tell us. Read the summary below.

The project explores how the pattern of energy use by different groups of consumers varies according to the time of day and week, as well as seasonal variation.  At the initial stage, in line with one of the most salient issue in the energy policy, we are exploring how much the smart meter data may inform us about fuel poverty in the UK and more specifically how this can be achieved if we link this data to socio economic characteristics and property types. As more administrative data sources are becoming available it looks feasible to create a comprehensive database that can describe output areas in terms of the energy consumption.  The further question we try to answer is how these data can undermine the probability of a selected customer to be vulnerable or at risk of fuel poverty.  With an aim to help energy suppliers to meet the goal of Energy Company Obligation (ECO), we can combine both qualitative and quantitative research analysis to design a unique algorithm which can help us to define vulnerable energy customers by looking at their energy consumption patterns and household characteristics. Research can serve as identification strategy for classification and prediction based on smart meter data. The results could also be used to predict energy consumption under a range of “what if?” scenarios and to estimate future patterns of energy demand from energy supplier’s customers.

Dynamics and equilibria in Twitter: Analyzing geographical lexical spread

Jacopo Rocchi from Aston University will present research into Dynamics and equilibria in Twitter: Analyzing geographical lexical spread. Read the summary below.

Understanding social processes such as the propagation of data in social networks is of both scientific and applicable value. We use an Information theoretical approach called Rank Vector Transfer Entropy (TERV) to study the propagation of new words across geographical areas in the United States. The main objects of our analysis are microblogging time series of occurrences of words. This data provides the opportunity to study the dynamics of social processes, expressed through the propagation of new words and the transfer entropy method allows one to infer the structure of the network on which these processes take place.


TERV is a well established method that have been used to identify the connectivity network in different contexts such as functional areas in the brain and financial indices in markets. The main questions that we address consist in finding influential areas which act as best spreaders of social influence and the typical time scale on which the information flows. Answering these questions could improve current spreading of linguistic innovations as well as identify their leaders.


Finally, one of the problems we are particularly interested in is the existence of islands of equilibrium in real data networks, i.e. parts of the network whose dynamics is dominated by coherent contributions from nodes within the island. These island have been recently studied in analytical frameworks and play an important role in the dynamics of complex systems because their long time behaviour can be described in terms of the well known equilibrium laws. Areas of the network which act as spreaders of innovations are good candidates to be part of such islands as they turn out to be symmetrically and strongly connected.

The ECB and eurozone monetary politics: Policy consensus in EMU revisited

Sebastian Diessner from London School of Economics will present research into The ECB and eurozone monetary politics: Policy consensus in EMU revisited. Read the summary below.

We look at the politics of central banking in the eurozone from an uncommon perspective. Much of the literature that influenced the design of Europe’s Economic and Monetary Union (EMU) assumed that there was a risk of inflation-biased governments seeking an undesirable interference of monetary and fiscal policy in the eurozone, which some refer to pejoratively as “monetary politics”. In contrast, we shed light on the reverse matter, namely the European Central Bank (ECB) as monetary policy-maker trying to exert influence on fiscal policy developments. Employing quantitative text analysis methods, namely probabilistic topic modelling, we examine a dataset of introductory statements to ECB press conferences, presentations of the annual report to the European Parliament and monetary dialogue meetings. We find that the ECB is increasingly invested in fiscal policy concerns, in particular since the height of the eurozone crisis, and thus call into question the old underpinnings of EMU’s strict separation of monetary and fiscal policy.