Posts Tagged ‘regression’

On your bike…….or on your calculator

Published on Jun 23rd, 2010 by Jack

On the 2nd July three members of the Northstar London office (Matthew, Jack and Chris) will be partaking in the 2nd Marketing Industry Triathlon relay race (see old news for further details). The organisers of the event have billed it as “a great networking opportunity for the marketing world to unite in healthy, fierce competition”. Please note the highlighted words – fierce and competition. To this end, Team Northstar will be looking to climb as far up the overall rankings as they can. I know what you are thinking………what on earth does this have to do with statistics? In the marketing world, statistics are commonly used to drive policy and decisions with a view to gaining a competitive advantage over rivals. Why not use the same statistical methods to gain a competitive advantage over our industry peers in a sporting context?

The following analysis was compiled on the basis of the results from last year’s Marketing Industry Triathlon relay race. Our aim is to find out which of the three triathlon disciplines (swimming, cycling and running) is key to our end position and thus the discipline in which we need to optimise performance to keep ahead of the pack. All of the analysis is based on overall finishing position and the individual (not cumulative) positions within the three disciplines.

Firstly, we have to identify if there is a relationship between the overall finishing position and the individual positions in the swim, cycle and run. Correlations run between overall finishing position vs. individual swim position/individual cycle position/individual run position yielded the following results:

There is a strong relationship between the overall finishing position vs. the finishing position(s) in all of the disciplines. However, the relationship between the overall finishing position vs. the position in the cycling leg is significantly larger than the relationship between the overall finishing position vs. swim and run positions.

So we now know that there are relationships between the finishing places in the individual disciplines and the overall finishing position, but surely it would be better to know the importance each discipline has on where you finish? Yes it would, and on that note please cue a Shapley Value regression analysis…

A Shapley Value regression on the importance of the swim/cycle/run position against the overall finishing position derived the following results:

This shows that the position on the cycling portion of the triathlon is considerably more important in determining the overall finishing position than the position in the other two disciplines, essentially meaning that the cycling leg of the triathlon is where the race will be won or lost (hopefully the former!).

That said, this is not a foregone conclusion. Many triathletes talk about the “4th discipline” within a triathlon – the transitions – i.e. going from the swim to the bike and the bike to the run. Throwing these into the Shapley Value regression mix as it were provides the following output:

Whilst the length of time spent transitioning is relatively short compared to the time in the water or on the track, it still goes a fair way in determining the overall finishing position, soaking up variance mostly from the cycling element of the triathlon.

So what does all of the above number crunching mean for Team Northstar on the 2nd July? Well, a “tri”-ad of tips based on the above would read as follows:

  • Performance within the cycling leg will be the key driver for triathlon success
  • That said, this is not to detract from the roles of swimming and running in our end position as both yield a significant degree of importance with regards to the overall finishing position
  • Relay triathlon is a team sport, with our performance in the transition zone accounting for 19% of importance in determining our overall finishing position

Multiple Regression

Published on Apr 14th, 2010 by

Multiple Regression

Measures the extent to which the level of one thing is dependent on several others. For example, a shop may measure customer satisfaction along with quality of products, range of products and helpfulness of staff. It gives a percentage figure (called the r2 value), the closer this is too 100% the better these items are at predicting customer satisfaction. Each individual  attribute is given a beta value which indicates how much influence each has. With this statistic the items that affect satisfaction can be identified and budget can be best allocated between them to achieve the highest increase in satisfaction.

Myth: 100%- A statistic close to 100% is virtually never achieved; anything approaching 50% is a good result

Shapley Value Regression

Published on Apr 8th, 2010 by

Using similar underpinnings to traditional multiple regression models, in that we are trying to measure the extent to which one thing is dependent on several others, Shapley value regressions give each predictor (independent variable) a value which represents that particular predictors’ share of importance in predicting the outcome of the dependent variable. As opposed to beta values generated by traditional regressions, Shapley value regressions assign each predictor a percentage figure – a share of importance – with all predictor shares of importance totalling 100%. This makes this method more intuitive to the non-statistician reader.

In the fictitious example below, the Shapley value regression shows that mobile phone user satisfaction is mostly driven by ‘Innovative and functional handsets’.