Posts Tagged ‘Correlation’

On your bike…….or on your calculator

Published on Jun 23rd, 2010 by Jack

On the 2nd July three members of the Northstar London office (Matthew, Jack and Chris) will be partaking in the 2nd Marketing Industry Triathlon relay race (see old news for further details). The organisers of the event have billed it as “a great networking opportunity for the marketing world to unite in healthy, fierce competition”. Please note the highlighted words – fierce and competition. To this end, Team Northstar will be looking to climb as far up the overall rankings as they can. I know what you are thinking………what on earth does this have to do with statistics? In the marketing world, statistics are commonly used to drive policy and decisions with a view to gaining a competitive advantage over rivals. Why not use the same statistical methods to gain a competitive advantage over our industry peers in a sporting context?

The following analysis was compiled on the basis of the results from last year’s Marketing Industry Triathlon relay race. Our aim is to find out which of the three triathlon disciplines (swimming, cycling and running) is key to our end position and thus the discipline in which we need to optimise performance to keep ahead of the pack. All of the analysis is based on overall finishing position and the individual (not cumulative) positions within the three disciplines.

Firstly, we have to identify if there is a relationship between the overall finishing position and the individual positions in the swim, cycle and run. Correlations run between overall finishing position vs. individual swim position/individual cycle position/individual run position yielded the following results:

There is a strong relationship between the overall finishing position vs. the finishing position(s) in all of the disciplines. However, the relationship between the overall finishing position vs. the position in the cycling leg is significantly larger than the relationship between the overall finishing position vs. swim and run positions.

So we now know that there are relationships between the finishing places in the individual disciplines and the overall finishing position, but surely it would be better to know the importance each discipline has on where you finish? Yes it would, and on that note please cue a Shapley Value regression analysis…

A Shapley Value regression on the importance of the swim/cycle/run position against the overall finishing position derived the following results:

This shows that the position on the cycling portion of the triathlon is considerably more important in determining the overall finishing position than the position in the other two disciplines, essentially meaning that the cycling leg of the triathlon is where the race will be won or lost (hopefully the former!).

That said, this is not a foregone conclusion. Many triathletes talk about the “4th discipline” within a triathlon – the transitions – i.e. going from the swim to the bike and the bike to the run. Throwing these into the Shapley Value regression mix as it were provides the following output:

Whilst the length of time spent transitioning is relatively short compared to the time in the water or on the track, it still goes a fair way in determining the overall finishing position, soaking up variance mostly from the cycling element of the triathlon.

So what does all of the above number crunching mean for Team Northstar on the 2nd July? Well, a “tri”-ad of tips based on the above would read as follows:

  • Performance within the cycling leg will be the key driver for triathlon success
  • That said, this is not to detract from the roles of swimming and running in our end position as both yield a significant degree of importance with regards to the overall finishing position
  • Relay triathlon is a team sport, with our performance in the transition zone accounting for 19% of importance in determining our overall finishing position

Correlation

Published on Apr 24th, 2010 by

Correlation

Indicates the extent to which two things are related. For example when it is cold ice-cream sales are low, then as the temperature increases so do ice-cream sales. A correlation is reported as an r value and can be anywhere between 1 and -1. A correlation of 1 means the two things increase at the same rate (temperature increases as ice-cream sales increase). -1 means that as one increases, the other decreases, (for example the more exercise someone does, the lower their risk of heart problems become). 0 means the two things are unrelated. A correlation (r) below .04 is typically considered low, from 0.4 to 0.6 is considered good and above 0.6 very good.

Myth: Causality- You can’t say based on correlation that a change in one thing will cause a change in another. For example people that do more exercise may also have a better diet and not smoke, so these other things help to lower the risk of heart problems not just the exercise. For this you need Multiple Regression.

Strong positive correlation (r=1)               No correlation (r=0)               Strong negative correlation (r=-1)