M1 M2 M3
M1_1 M1_2 M1_diff M2_1 M2_2 M2_diff M3_1 M3_2 M3_diff
A 55.2 60.8 5.6 66.7 69.8 3.1 58.5 60.3 1.8
B 56.8 55.4 1.4 62.8 63.9 1.1 65.7 69.8 4.1
C 52.3 54.3 2.0 53.8 55.9 1.1 56.7 57.9 1.2
我必须找到哪个M1,M2,M3最适合A,B,C中的每一个。标准是Mi_1,Mi_2最高,Mi_diff最低(i = 1,2,3)。与id B一样,它可能是第二个模型。我必须为id选择一个M. B对于M2具有最低的差异,所以我选择M2作为B,M3也可以选择其具有更高的精度,但差异很大。我不能提出任何通用算法来做到这一点。我们可以设置diff值的截止值,然后选择M'。就像1.5是diff的下限一样,M3最适合id B.
数据相当大,有近1000个独特的ID,不能手动一个。我在想可能有一些简单的解决方案,我没有得到。有人可以帮忙吗?我正在使用R进行计算。
答案 0 :(得分:1)
你只需要提出一些满足你标准的方程式。 例如,正如您希望M1和M2尽可能高,但它们的差异尽可能低,您可能希望最大化:
M1*M2/(M1-M2)
您可以在此等式中添加系数,以提高任何术语的重要性。
在R:
# Set RNG seed for reproducibility
set.seed(12345)
# Generate some data
num.rows <- 1000
df <- data.frame(M1_1 = runif(num.rows, 0, 100),
M1_2 = runif(num.rows, 0, 100),
M2_1 = runif(num.rows, 0, 100),
M2_2 = runif(num.rows, 0, 100),
M3_1 = runif(num.rows, 0, 100),
M3_2 = runif(num.rows, 0, 100))
df$M1_diff <- abs(df$M1_1 - df$M1_2)
df$M2_diff <- abs(df$M2_1 - df$M2_2)
df$M3_diff <- abs(df$M3_1 - df$M3_2)
# We call apply with 1 as the second parameter,
# meaning the function will be applied to each row
res <- apply(df, 1, function(row)
{
# Our criterium, modify at will
M1_prod <- row["M1_1"] * row["M1_2"] / row["M1_diff"]
M2_prod <- row["M2_1"] * row["M2_2"] / row["M2_diff"]
M3_prod <- row["M3_1"] * row["M3_2"] / row["M3_diff"]
# Which is the maximum? Returns 1, 2 or 3
which.max(c(M1_prod, M2_prod, M3_prod))
})
输出
> head(df)
M1_1 M1_2 M2_1 M2_2 M3_1 M3_2 M1_diff M2_diff M3_diff
1 72.09039 7.7756704 95.32788 43.06881 27.16464 18.089266 64.314719 52.25907 9.075377
2 87.57732 84.3713648 62.17875 86.29595 62.93161 18.878981 3.205954 24.11720 44.052625
3 76.09823 0.6813684 53.16722 25.12324 85.90863 72.700354 75.416864 28.04398 13.208273
4 88.61246 35.1184204 89.20926 76.34523 36.97298 3.062528 53.494036 12.86403 33.910451
5 45.64810 68.6061032 19.58807 69.40719 28.21637 58.466682 22.958007 49.81913 30.250311
6 16.63718 25.4086494 88.43795 73.68140 81.37349 75.001685 8.771471 14.75656 6.371807
> head(res)
[1] 2 1 3 2 1 3