根据“不足”查找表分配值

时间:2013-01-25 03:40:34

标签: r merge lookup

我必须查找一些分数并根据固定的查找表分配百分位值。

我已经尝试解决这个问题一段时间了,我已经阅读了thisthis SO线程,但没有解决我的问题。我的问题是原始分数可能比查询表中的值大,在这种情况下,规定了最大的百分位值。

我有一个这样的查找表,

lookup <- structure(list(Percentile = c(99, 95, 90, 85, 80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20, 15, 10, 5, 1), ACB = c(24, 19, 18, 17, 16, NA, 15, NA, 14, NA, NA, 13, NA, NA, NA, 12, NA, 11, 10, 9, 7), DFG = c(49, 39, 36, 33, 31, 30, 29, 28, 27, 26, 25, NA, 24, 23, 22, 21, 20, 19, 17, 14, 12), EIH = c(35, 30, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, NA, 14, NA, 13, 12, NA), GKJ = c(49, 39, 36, 33, 31, 30, 29, 28, 27, 26, 25, NA, 24, 23, 22, 21, 19, 18, 17, 15, 14), Total = c(112, 99, 91, 86, 82, 79, 76, 75, 73, 71, 69, 67, 66, 65, 63, 61, 59, 55, 51, 46, 39)), .Names = c("Percentile", "ACB", "DFG", "EIH", "GKJ", "Total"), row.names = c("99+", "95", "90", "85", "80", "75", "70", "65", "60", "55", "50", "45", "40", "35", "30", "25", "20", "15", "10", "5", "1"), class = "data.frame")
lookup
    Percentile ACB DFG EIH GKJ Total
99+         99  24  49  35  49   112
95          95  19  39  30  39    99
90          90  18  36  27  36    91
85          85  17  33  26  33    86
80          80  16  31  25  31    82
75          75  NA  30  24  30    79
70          70  15  29  23  29    76
65          65  NA  28  22  28    75
60          60  14  27  21  27    73
55          55  NA  26  20  26    71
50          50  NA  25  19  25    69
45          45  13  NA  18  NA    67
40          40  NA  24  17  24    66
35          35  NA  23  16  23    65
30          30  NA  22  15  22    63
25          25  12  21  NA  21    61
20          20  NA  20  14  19    59
15          15  11  19  NA  18    55
10          10  10  17  13  17    51
5            5   9  14  12  15    46
1            1   7  12  NA  14    39

,并且,看起来像这样的原始数据,

rawS_1 <- structure(list(ACB = 28, DFG = 39, EIH = 31, GKJ = NA_real_, Total = NA_real_), .Names = c("ACB", "DFG", "EIH", "GKJ", "Total"), row.names = "RawScore for ID 1", class = "data.frame")
rawS_1
                  ACB DFG EIH GKJ Total
RawScore for ID 1  28  39  31  NA    NA

rawS_2 <- structure(list(ACB = 29, DFG = 51, EIH = 56, GKJ = 60, Total = 169), .Names = c("ACB", "DFG", "EIH", "GKJ", "Total"), row.names = "RawScore for ID 2", class = "data.frame")
rawS_2
                  ACB DFG EIH GKJ Total
RawScore for ID 2  29  51  56  60   169

,这就是我想做的事,

                  ACB DFG EIH GKJ Total
RawScore for ID 1  12  39  19  NA    NA
Percentile, ID 1   25  95  50  NA    NA
                  ACB DFG EIH GKJ Total
RawScore for ID 2  29  51  56  60   169
Percentile, ID 2   99  99  99  99    99

我已尝试使用merge()all.x = TRUE suffixes = c(".x",".y")),但我不断我不想要,我们将不胜感激。< / p>

2 个答案:

答案 0 :(得分:2)

我认为你最好不要将其视为合并的问题,而是将其视为创建函数的问题:你需要一个函数,当给定(例如)ACB的原始值时,返回百分位数。幸运的是,R有一个功能,旨在通过数字表格来创建一个函数:approxfun

以下代码使用lapply为每列创建一个函数,然后显示如何调用新函数:

vars <- names(lookup)[-1]
lookup_funs <- lapply(vars, function(var) {
  df <- data.frame(x = lookup[[var]], y = lookup$Percentile)
  df <- df[complete.cases(df), ]
  approxfun(df$x, df$y, "constant", rule = 2)
})
names(lookup_funs) <- vars

lookup_funs$ACB(c(12, 29))
lookup_funs$Total(169)

答案 1 :(得分:1)

基本策略是使用!is.na(vec)来索引值和感知向量。这是一个案例。对于ACB输入11,你更喜欢哪一个?

> rev(lookup$Percentile)[!is.na(lookup$ACB)][
                findInterval( 11, c(-Inf,rev(lookup$ACB[!is.na(lookup$ACB)]), Inf))]
[1] 20
> rev(lookup$Percentile)[!is.na(lookup$ACB)][
                findInterval( 11, c(-Inf,rev(lookup$ACB[!is.na(lookup$ACB)]), Inf))-1]
[1] 15

这可以让您获得一行数据的大部分内容:

> for(i in names(rawS_1) ) {print(rawS_1[i]); print(rev(lookup$Percentile)[ !is.na(lookup[[i]]) ][ findInterval( rawS_1[i], c( rev( lookup[[i]][ !is.na(lookup[[i]] )]) ) )] )}
                  ACB
RawScore for ID 1  28
[1] 99
                  DFG
RawScore for ID 1  39
[1] 95
                  EIH
RawScore for ID 1  31
[1] 90
                  GKJ
RawScore for ID 1  NA
[1] NA
                  Total
RawScore for ID 1    NA
[1] NA

你确实会从比例高端的索引中减去1来进行索引超支,因此在决定要查看的结果后,你可能应该在查找向量上添加一个额外的元素。

for(i in names(rawS_2) ) {print(rawS_2[i]); print(rev(lookup$Percentile)[ !is.na(lookup[[i]]) ][ findInterval( rawS_2[i], c( rev( lookup[[i]][ !is.na(lookup[[i]] )]) ) )] )}
                  ACB
RawScore for ID 2  29
[1] 99
                  DFG
RawScore for ID 2  51
[1] 99
                  EIH
RawScore for ID 2  56
[1] 95
                  GKJ
RawScore for ID 2  60
[1] 99
                  Total
RawScore for ID 2   169
[1] 99