DIVA is an online resource developed by David Deacon and James Stanyer of the Centre for Research in Communication and Culture, Loughborough University in collaboration with Business Intelligence And Strategy (B.I.A.S) and is supported Research England's Higher Education Innovation Fund.
If using DIVA for published research we would appreciate your support by citing the paper that inspired its development:
Deacon, D. & Stanyer, J. (2018) Media Diversity and the Quantification of Qualitative Variation, Loughborough: Centre for Research in Communication and Culture, Loughborough University (Email: d.n.deacon@lboro.ac.uk for access to the paper) Download Paper Here
ABOUT THE WEBSITE
Key Features of Website
Statistical calculations
Automatically calculates 7 diversity measures for categorical data.
Bootstrapping
Provides automated confidence intervals for diversity scores using bootstrappping resampling methods.
Hypothesis Testing
Provides two-way hypothesis test using permutation testing and resampling.
DIVA
Calculating Diversity Scores for Categorical Data
DIVA is an online resource for analysing diversity in statistical distributions that attain the nominal/ categorical level of measurement.
This service allows you to enter frequency distributions with up to thirty categories to calculate:
Diversity scores using six existing measures of qualitative variation:
Simpsons’ D
1-\frac{\sum{n(n-1)}}{N(N-1)}
n = number of cases in a category. N = total number of cases in a sample.
HREL
-\sum^k_{i=1}\frac{f_i}{N}log_2\frac{f_i}{N}
the minus sign is to make the final value of HREL positive.
2. Analyse and compare diversity scores and confidence intervals for two variables and then test whether the observed differences in diversity scores are statistically significant?
Concerns about diversity are central to many debates about democracy. Promoting diversity can improve social inclusion, enhance the plurality of public debate, guard against the concentration of power, promote equality of opportunities and so on. In communication and media studies, we see a range of diversity considerations. From an institutional perspective, these include concerns about ownership of media and creative industries, market share and convergence, who works within these industries and who has the power to regulate their structures and practices. From a citizen-based perspective, diversity considerations draw attention to questions such as the equality of public access, the needs of citizens and the preferences of consumers. Concerns about the diversity of media content and representation link with supply and demand side questions, in particular, when assessing the impact that different media environments have upon the inclusiveness of public discourse. In discussions about content diversity, attention has focused upon (but is not restricted to) the measurement of ‘source diversity’ (i.e. which individuals and institutions gain greatest prominence in media representations) and ‘content diversity’ (i.e. what issues and frames receive greatest prominence).
In most cases, concerns about diversity are about statistical scale, patterning and extent. Many quantitative measurements of diversity only attain the most basic ‘categorical’ level of measurement. The DIVA website offers a unique resource for the statistical description of diversity in categorical data and drawing wider statistical conclusions on their basis.
Categorical data refers to the use of numbers to assign different cases to separate groups. There is no order to categorical data – all you can say how many cases correspond to each number. Categorical data is sometimes alternatively referred to as ‘nominal data’ – i.e. numbers are used to ‘name’ a case to a category. An example of a categorical variable is an opinion poll question that asks respondents to indicate their preferred political party, assigning a different number to each political party. Categorical data are also sometimes referred to as ‘qualitative variation’ – in contrast to ‘quantitative variation’, where numbers assigned have a continuous relationship with each other (e.g. they indicate whether a case is greater or smaller than another and, depending on the level of measurement, can be used to measure more precise mathematical relationships between cases).
The Diversity Average (DIVa) is a new measure developed by the authors of this web resource (see Deacon and Stanyer, 2018). It averages the diversity scores found for Simpsons’ D, HREL, the Index of Qualitative Variation and Deviation from the Mode (DM).
Simpsons’ D was developed in ecological research to measure species’ richness and diversity in field experiments (see Simpson, 1949).
HREL emerged from information science to calculate information entropy (see Shannon, 1948).
The Index of Qualitative Variation (IQV) was developed by sociologists to measure statistical dispersion in categorical measurements (see Mueller and Schuessler, 1961).
The Herfindahl-Hirschman Index (HHI) was developed by economists to measure the extent to which markets are concentrated or competitive (see
The Deviation from the Mode was developed by political scientists to measure the extent to which each value in a categorical distribution varied in absolute terms from the largest category (the Mode) (see Wilcox, 1973).
The Mean Difference Analog (MDA) also emerged from political science and measures the absolute differences of all the possible pairs of variate-values in a categorical distribution (see Wilcox, 1973). Our testing has found the MDA to be the most unstable and the most susceptible to the categorisation effect noted above (see DIVa).
References:
Deacon, D. and Stanyer, J. (2018) Media Diversity and the Measurement of Qualitative Variation, Loughborough: Centre for Research in Communication and Culture, Loughborough University.
Hirschmann, A. (1964). The Paternity of an Index. American Economic Review, 54, 761.
Meuller, J. and Schuessler, K. (1961) Statistical Reasoning in Sociology, Boston: Houghton Miffin.
Wilcox, A. (1973) ‘Indices of qualitative variation and political measurement’, Western Political Quarterly; 26(2): 325-343.
Diversity Scores
With all but one of the diversity indices (HHI), diversity is measured between 1 and zero. When the distribution of all the observations is the same the score is 1 (indicating total heterogeneity). When the distribution of observations fall into just one category, the score is zero (indicating total homogeneity). The degree of diversity in a distribution is assessed by considering the proximity of the diversity score to either 1 or zero.
With HHI, the results range from near zero to 10,000. With this measure, the lower a diversity score is, the greater the diversity of a distribution. This measure is often used in economics to measure the competitiveness of markets. The US Dept of Justice considers a market with an HHI of less than 1500 to be competitive, 1500-2500 to be moderately competitive and 2500 and above as highly concentrated. The HHI is mathematically directly equivalent to Simpsons’ D.
l
The rows labelled 0.025 and 0.975
Below each diversity score are two further measures (the rows labelled ‘0.025’ and ‘0.975’). Taken together, these are the confidence intervals to be used to estimate wider population values on the basis of sample data. They indicate that there is a 95 percent chance that the ‘true’ population value is located somewhere between the lower and higher value.
(For comparing diversity scores between two measures)
Difference between diversity scores
This is the absolute difference in the diversity scores for the two distributions.
P Value
This is the probability value that indicates the likelihood that the observed difference between diversity scores occurred by chance. The P value needs to be lower than 0.05 for the difference to be deemed statistically significant. This hypothesis testing is conducted using permutation testing methods, using 10,000 resamples with replacement.
The confidence intervals are calculated using bootstrap resampling methods. The name is taken from the phrase ‘pulling oneself by one’s own bootstraps’ and calculates population distributions by using the information you have available: i.e., your observed sample data distributions which you assume represent a valid estimation of the population. Bootstrapping involves generating large numbers of random resamples (with replacement) from your observed distribution. When we talk of sampling ‘with replacement’ we are describing a process whereby once a value is selected for inclusion in a resample, it is returned to the selection process thereby permitting the possibility that it might again be resampled randomly (and therefore potentially appear more than once in a resample). The bootstrapping for this service entails calculating 1,000 resamples with replacement. The range of diversity scores are then calculated for each resample, the resulting scores are then ranked from lowest to highest and the 95 percent confidence interval represents the range between the values found at 2.5% and 97.5% points of the distribution.
The hypothesis testing in DIVA uses permutation testing procedures. Permutation testing shares similarities to bootstrapping, in that it involves the estimation of population distributions by multiple resampling of observed values. The principal differences are that it is used to compare differences between two variables (hence its role in hypothesis testing) and the resampling is conducted without replacement (i.e., when a value is resampled it cannot be included in subsequent selections). The procedure starts by considering the observed differences in diversity scores for two sets of data. It then calculates a permutation distribution, i.e. the dispersion of values you would see if there was no difference between the two sets. To do this, the observed distributions in two variables are shuffled into a pooled variable (e.g. two variables with 15 categories would create a pooled distribution of 30 categories). Two new, separate variable distributions are then created (that mirror the original sample structure) by a random selection of values without replacement from the pooled data. The measure under assessment (e.g. diversity scores) is then calculated for both resampled distributions and the value of the second resample is subtracted from the first. This will produce a positive or negative value. This process is repeated multiple times to produce the permutation distribution (DIVA conducts 10,000 resamples). The observed difference is then mapped onto the permutation distribution to gain a p value. This is calculated as the proportion of resampled values that give a result as high as the observed difference being tested.
For an introduction to diversity measures we recommend:
Deacon, D. and Stanyer, J. (2018) Media Diversity and the Measurement of Qualitative Variation, Loughborough: Centre for Research in Communication and Culture, Loughborough University.
McDonald, D. and Dimmock, J. (2003) ‘The conceptualisation and measurement of diversity’, Communication Research, 30(1): 60-79.
Wilcox, A. (1973) ‘Indices of qualitative variation and political measurement’, Western Political Quarterly; 26(2): 325-343.
For an overview of wider issues concerning media diversity, see:
Roessler, P. (2008) ‘Media content diversity: Conceptual issues and future directions for communications research’. In: Beck, CS (ed.) Communication Yearbook 31. New York: Lawrence Erlbaum, 464–520.
McQuail, D. and van Cuilenburg, J. (1983) `Diversity as a media policy goal: a strategy for evaluative research and a Netherlands case study', Gazette 31(3): 145-162.
THE TEAM
The authors of this website
Prof. David Deacon
Professor of Communication and Media Analysis
BA (Hons) Communication Studies, MA in Mass Communications , PhD. David Deacon teaches on the BSc in Communication and Media Studies, MA in Media and Cultural Analysis and MA in Global Media and Cultural Industries. More information about publications, research, information for students and other matters can be found on his home page.
Prof. James Stanyer
Professor of Communication and Media Analysi
Head of Department of Social Sciences, Professor of Communication and Media Analysis. James Stanyer gained a PhD in Government from the London School of Economics in 1999. His research and teaching interests lie primarily in the areas of national and transnational political communication.
Peter Dean
Project Manager
Experienced entrepreneur and business owner with a demonstrated history of providing financial, strategic, technological and people skills to drive digital transformation and shareholder value growth for his clients.
Zhao(Orange) Gao
Application Developer
She is an enthusiastic PhD Computer Science student at Loughborough University with a particular interest in Computer Vision, Artificial Intelligence and Data Mining.
CONTACT
If using DIVA for publication, we would appreciate your support by citing the paper that inspired its development
Deacon, D. & Stanyer, J. (2018) Media Diversity and the Quantification of Qualitative Variation, Loughborough: Centre for Research in Communication and Culture, Loughborough University (Email: d.n.deacon@lboro.ac.uk for access to the paper)
DIVA harnesses research expertise from Loughborough University and is supported by Research England's Higher Education Innovation Fund.