I feel that this is a somewhat complex issue which may not necessarily have a simple solution and may require machine learning or other advanced techniques to resolve.
Firstly, to explain the issue at hand, say we have a runner who participates in a number of outdoor races where the elements (ie wind) affect the athletes speed. If we know the baseline speed of the runner it’s easy to determine the percentage affect that the elements have had in each race, for example:
Name Baseline Race1 Race2 Race3
1 Runner 100 102 98 106
The contributing element_factors
for Race1, Race2 and Race3 are:
[1] 1.02 0.98 1.06
In this example we can see that the runner in Race 1 has had a tail wind which has increased his baseline speed by 2%, etc.
However, in the real world we don’t necessarily know what the runners baseline speed is because all we have are their race results to go on and we don’t necessarily know how the elements are affecting the baseline.
Take for example the race results as listed in the following dataframe
df<-data.frame(Name = c("Runner 1","Runner 2","Runner 3","Runner 4","Runner 5"),
Baseline = c("unknown","unknown","unknown","unknown","unknown"),
Race1 = c(101,"NA",80.8,111.1,95.95),
Race2 = c(102,91.8,"NA",112.2,"NA"),
Race3 = c(95,85.5,76,"NA",90.25),
Race4 = c("NA",95.4,74.8,116.6,100.7))
Name Baseline Race1 Race2 Race3 Race4
1 Runner 1 unknown 101 102 95 NA
2 Runner 2 unknown NA 91.8 85.5 95.4
3 Runner 3 unknown 80.8 NA 76 74.8
4 Runner 4 unknown 111.1 112.2 NA 116.6
5 Runner 5 unknown 95.95 NA 90.25 100.7
What I want to be able to do is calculate (approximate) from this dataframe each runners baseline speed value and the factors relating to each race. The solutions in this case would be:
Baseline<-c(100,90,80,100,95)
[1] 100 90 80 100 95
element_factors<-c(1.01,1.02,0.95,1.06)
[1] 1.01 1.02 0.95 1.06
Setting the baseline speed as the runners average is overly simplistic as we can see that some runners only race in events that have a tail wind and therefore their baseline will fall lower than all their race results.