我必须查找一些分数并根据固定的查找表分配百分位值。
我已经尝试解决这个问题一段时间了,我已经阅读了this和this SO线程,但没有解决我的问题。我的问题是原始分数可能比查询表中的值大,在这种情况下,规定了最大的百分位值。
我有一个这样的查找表,
lookup <- structure(list(Percentile = c(99, 95, 90, 85, 80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20, 15, 10, 5, 1), ACB = c(24, 19, 18, 17, 16, NA, 15, NA, 14, NA, NA, 13, NA, NA, NA, 12, NA, 11, 10, 9, 7), DFG = c(49, 39, 36, 33, 31, 30, 29, 28, 27, 26, 25, NA, 24, 23, 22, 21, 20, 19, 17, 14, 12), EIH = c(35, 30, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, NA, 14, NA, 13, 12, NA), GKJ = c(49, 39, 36, 33, 31, 30, 29, 28, 27, 26, 25, NA, 24, 23, 22, 21, 19, 18, 17, 15, 14), Total = c(112, 99, 91, 86, 82, 79, 76, 75, 73, 71, 69, 67, 66, 65, 63, 61, 59, 55, 51, 46, 39)), .Names = c("Percentile", "ACB", "DFG", "EIH", "GKJ", "Total"), row.names = c("99+", "95", "90", "85", "80", "75", "70", "65", "60", "55", "50", "45", "40", "35", "30", "25", "20", "15", "10", "5", "1"), class = "data.frame")
lookup
Percentile ACB DFG EIH GKJ Total
99+ 99 24 49 35 49 112
95 95 19 39 30 39 99
90 90 18 36 27 36 91
85 85 17 33 26 33 86
80 80 16 31 25 31 82
75 75 NA 30 24 30 79
70 70 15 29 23 29 76
65 65 NA 28 22 28 75
60 60 14 27 21 27 73
55 55 NA 26 20 26 71
50 50 NA 25 19 25 69
45 45 13 NA 18 NA 67
40 40 NA 24 17 24 66
35 35 NA 23 16 23 65
30 30 NA 22 15 22 63
25 25 12 21 NA 21 61
20 20 NA 20 14 19 59
15 15 11 19 NA 18 55
10 10 10 17 13 17 51
5 5 9 14 12 15 46
1 1 7 12 NA 14 39
,并且,看起来像这样的原始数据,
rawS_1 <- structure(list(ACB = 28, DFG = 39, EIH = 31, GKJ = NA_real_, Total = NA_real_), .Names = c("ACB", "DFG", "EIH", "GKJ", "Total"), row.names = "RawScore for ID 1", class = "data.frame")
rawS_1
ACB DFG EIH GKJ Total
RawScore for ID 1 28 39 31 NA NA
rawS_2 <- structure(list(ACB = 29, DFG = 51, EIH = 56, GKJ = 60, Total = 169), .Names = c("ACB", "DFG", "EIH", "GKJ", "Total"), row.names = "RawScore for ID 2", class = "data.frame")
rawS_2
ACB DFG EIH GKJ Total
RawScore for ID 2 29 51 56 60 169
,这就是我想做的事,
ACB DFG EIH GKJ Total
RawScore for ID 1 12 39 19 NA NA
Percentile, ID 1 25 95 50 NA NA
ACB DFG EIH GKJ Total
RawScore for ID 2 29 51 56 60 169
Percentile, ID 2 99 99 99 99 99
我已尝试使用merge()
和all.x = TRUE
suffixes = c(".x",".y"))
,但我不断我不想要,我们将不胜感激。< / p>
答案 0 :(得分:2)
我认为你最好不要将其视为合并的问题,而是将其视为创建函数的问题:你需要一个函数,当给定(例如)ACB的原始值时,返回百分位数。幸运的是,R有一个功能,旨在通过数字表格来创建一个函数:approxfun
。
以下代码使用lapply
为每列创建一个函数,然后显示如何调用新函数:
vars <- names(lookup)[-1]
lookup_funs <- lapply(vars, function(var) {
df <- data.frame(x = lookup[[var]], y = lookup$Percentile)
df <- df[complete.cases(df), ]
approxfun(df$x, df$y, "constant", rule = 2)
})
names(lookup_funs) <- vars
lookup_funs$ACB(c(12, 29))
lookup_funs$Total(169)
答案 1 :(得分:1)
基本策略是使用!is.na(vec)
来索引值和感知向量。这是一个案例。对于ACB输入11,你更喜欢哪一个?
> rev(lookup$Percentile)[!is.na(lookup$ACB)][
findInterval( 11, c(-Inf,rev(lookup$ACB[!is.na(lookup$ACB)]), Inf))]
[1] 20
> rev(lookup$Percentile)[!is.na(lookup$ACB)][
findInterval( 11, c(-Inf,rev(lookup$ACB[!is.na(lookup$ACB)]), Inf))-1]
[1] 15
这可以让您获得一行数据的大部分内容:
> for(i in names(rawS_1) ) {print(rawS_1[i]); print(rev(lookup$Percentile)[ !is.na(lookup[[i]]) ][ findInterval( rawS_1[i], c( rev( lookup[[i]][ !is.na(lookup[[i]] )]) ) )] )}
ACB
RawScore for ID 1 28
[1] 99
DFG
RawScore for ID 1 39
[1] 95
EIH
RawScore for ID 1 31
[1] 90
GKJ
RawScore for ID 1 NA
[1] NA
Total
RawScore for ID 1 NA
[1] NA
你确实会从比例高端的索引中减去1来进行索引超支,因此在决定要查看的结果后,你可能应该在查找向量上添加一个额外的元素。
for(i in names(rawS_2) ) {print(rawS_2[i]); print(rev(lookup$Percentile)[ !is.na(lookup[[i]]) ][ findInterval( rawS_2[i], c( rev( lookup[[i]][ !is.na(lookup[[i]] )]) ) )] )}
ACB
RawScore for ID 2 29
[1] 99
DFG
RawScore for ID 2 51
[1] 99
EIH
RawScore for ID 2 56
[1] 95
GKJ
RawScore for ID 2 60
[1] 99
Total
RawScore for ID 2 169
[1] 99