R - Solving Indeterminate Problems

时间:2018-04-20 01:06:20

标签: r

I feel that this is a somewhat complex issue which may not necessarily have a simple solution and may require machine learning or other advanced techniques to resolve.

Firstly, to explain the issue at hand, say we have a runner who participates in a number of outdoor races where the elements (ie wind) affect the athletes speed. If we know the baseline speed of the runner it’s easy to determine the percentage affect that the elements have had in each race, for example:

    Name Baseline Race1 Race2 Race3
1 Runner      100   102    98   106

The contributing element_factors for Race1, Race2 and Race3 are:

[1] 1.02 0.98 1.06

In this example we can see that the runner in Race 1 has had a tail wind which has increased his baseline speed by 2%, etc.

However, in the real world we don’t necessarily know what the runners baseline speed is because all we have are their race results to go on and we don’t necessarily know how the elements are affecting the baseline.

Take for example the race results as listed in the following dataframe

df<-data.frame(Name = c("Runner 1","Runner 2","Runner 3","Runner 4","Runner 5"),
Baseline = c("unknown","unknown","unknown","unknown","unknown"),
Race1 = c(101,"NA",80.8,111.1,95.95),
Race2 = c(102,91.8,"NA",112.2,"NA"),
Race3 = c(95,85.5,76,"NA",90.25),
Race4 = c("NA",95.4,74.8,116.6,100.7))

      Name Baseline Race1 Race2 Race3 Race4
1 Runner 1  unknown   101   102    95    NA
2 Runner 2  unknown    NA  91.8  85.5  95.4
3 Runner 3  unknown  80.8    NA    76  74.8
4 Runner 4  unknown 111.1 112.2    NA 116.6
5 Runner 5  unknown 95.95    NA 90.25 100.7

What I want to be able to do is calculate (approximate) from this dataframe each runners baseline speed value and the factors relating to each race. The solutions in this case would be:

Baseline<-c(100,90,80,100,95)
[1] 100  90  80 100  95

element_factors<-c(1.01,1.02,0.95,1.06)
[1] 1.01 1.02 0.95 1.06

Setting the baseline speed as the runners average is overly simplistic as we can see that some runners only race in events that have a tail wind and therefore their baseline will fall lower than all their race results.

0 个答案:

没有答案