Posts Tagged ‘sampling’

The Role of Statistical Integrity in the “Web 2.0” Era

Published on Oct 20th, 2010 by Jack

One of the core statistical values to be promoted as part of the forthcoming World Statistics Day is integrity. The recent rise of user generated content and DIY internet research (all elements of what is known as “Web 2.0”) means that awareness of this particular value is especially important.

Traditionally, it has generally been the case that organisations such as governments and universities are the outlets which drop statistical bombshells. Now quite literally anyone has access to the tools to do this.

In recent years it has become possible for anyone with basic IT knowledge (and often limited statistical ability) to either host an internet survey or place a poll on a website. Users of such research tools (most of the time) do not show the same integrity statistics professionals a) emit themselves or b) would like to see within their profession. Gone are robust sampling methods, as anyone can partake in “Web 2.0” research exercises, killing notions of representativeness and non-probability sampling. Additionally, there are no mechanisms to stop people omitting their opinion on numerous occasions, causing data sets to be filled with duplicated information. However, and most importantly, those utilising such research tools do not treat the information they gather in a manner which is forthright. Often the data generated by such research will be presented with no regard for statistical reporting protocols. Furthermore, and most worryingly, despite its (often) unrobust nature, the information gathered by such methods may feed into the decision making process.

So what does this all mean? It means that when we promote the integrity of statistics, we must also seek to make “Web 2.0” DIY researchers aware that increasing their statistical integrity will not only improve their work, but also the reputation of all statistical investigations as well. Additionally, it also means we must raise awareness of these methods amongst the untrained  readers of statistics, so as to avoid true, integrity driven statistical research being tarnished with any brush which may criticise “statistics 2.0”.

Teaching an Old Dog Old Tricks

Published on Oct 4th, 2010 by Jack

Teaching Old Dogs Old Tricks

Social media research has planted a sense of fear within traditional researchers. These worries are centred around the question; how do we study this new wave of information? Well, for all the “old dog”, traditional researchers out there, fear not. Many of the traditional principals of research as we have historically known them are applicable, and indeed necessary, when conducting social media research.

Sampling

The underlying purpose of sampling is to gain an insight into the views of a large population by looking at a portion of it, as to research an entire population is practically impossible. Similarly, to investigate all on-line conversations is equally unpractical. Therefore, the answer to this social media research dilemma is to sample. Yes, sample, as researchers have been doing since the dawn of research time. However, whereas before we would quota sample by (for example) brand ownership, in social media research we must sample by source of conversation. Furthermore, in traditional research a sample size may be n=300. In social media research n will still equal 300, but 300 conversations not 300 respondents.

Data Quality

Clean data is a fundamental part of any researcher’s investigative diet. Why should this change just because we are moving from studying conversations in a focus group to conversations on the web? Answer – it shouldn’t. In fact, as the data in social media research is user generated we have to be even more careful about its quality. Whereas in traditional quantitative research we have to be careful of straight liners, in social media research we have to be wary of re-tweeters and spammers to avoid analysing duplicated and irrelevant web chat. Add to this the unproven nature of social media research and the clientside scepticism about it, and the need for clean data in social media research is obvious.

Relevant Analysis

Despite its web based origin, there is no reason for researchers to look at the analysis of social media data in too much of a separate light to standardised analytical practices. At the crux of social media data analysis is the idea that researchers need to filter through conversations, extracting relevant insights and content in order to meet the research goal – does this process sound familiar? At this stage I would expect qualitative researchers to reminisce about a process they have undertaken on hundreds of occasions.

Obviously, the finer points of conducting traditional and social media research do differ. However, broadly speaking, traditional researchers need not worry. Researching social media does not require an entire new outlook on the research process. As shown above, at a high level the old, rugged and proven research principals will suffice. That said, with the speed at which social media research is growing this statement may only be applicable for the immediate future. Researchers watch this space…

Sampling

Published on Apr 8th, 2010 by


The process whereby a number of observations within a wider population are collected. This is used within research as it is too expensive, unpractical and time consuming to collate the observations of an entire population.

Probability Sampling

Sampling techniques where the entire population has an equal chance of being chosen in the sample.

Non Probability Sampling

Sampling techniques where certain elements of the population have zero chance of selection. Those who are eligible for selection and those who are not are determined by a set of pre-specified assumptions.

Random Sampling

A sampling method whereby members of the wider population are selected at random to be in the sample. This is a probability sampling technique as all of the population have an equal opportunity to be selected.

Systematic Sampling

Also referred to as the “N th sampling technique” whereby the every nth record is selected from the population. For example, if the population is 500 and n th = 5 then every 5th member of the population (5th, 10th, 15th and so on) will be selected. As the n th value is selected at random, this method constitutes probability sampling.

Quota Sampling

Within a quota sampling approach the population is segmented into mutually exclusive groups which are then targeted based on a set of predetermined criteria. As only those within the exclusive groups are eligible for selection in the sample, this is a non probability method of sampling. This allows the observations to be very focused as essentially, the researcher can determine precisely who will/will not be in their sample.

Representative Sampling

When a sample is collected to represent certain features of its wider population but on a smaller scale. Such samples may represent a geographical population and be based on demographics.