通过从另一个表中查找值来更新列值

时间:2018-08-27 23:05:34

标签: r lookup

所以我希望能够基于df表找到ScoreLU值。例如,DSCRpd中的1.3730682的值应返回ScoreLU值60,因为它大于1.35但小于下一个值1.65。

另一方面,对于“杠杆”列,它必须按Desc顺序排列,即第一个值2.01应返回值60,因为它小于2.5但大于下一个值2.0。

[df][1]
   DSCRpd Leverage         TCB
1  1.3730682 2.010122 -1590099.11
2  1.0449597 2.680051   493370.85
3  1.0311141 4.790531    21594.63
4  1.3923007 3.279903  -499326.76
5  1.6443938 3.853003   988780.79
6  0.6265976 1.814359  1003736.73
7  2.1025253 4.412528  1245305.83
8  1.2872873 2.074424  -688305.83
9  0.5088294 2.504510  1406986.68
10 1.7794307 3.724905  1132513.33


[ScoreLU][2]
      Score DSCRpd Leverage     TCB
 1:       0   0.65      5.0       0
 2:      10   0.80      4.5  100000
 3:      20   0.95      4.0  250000
 4:      30   1.10      3.5  500000
 5:      40   1.20      3.0  850000
 6:      50   1.26      2.5 1250000
 7:      60   1.35      2.0 1700000
 8:      70   1.65      1.5 2300000
 9:      80   2.00      1.0 2900000
10:      90   2.30      0.5 3600000

是的,就像具有Asc和Desc顺序功能的excel中的vlookup函数一样。帮助。

我有一个可以正确获取值的函数...但是我如何在每列上使用它来将值填充到适当的列,即,对于DSCRpd分数,结果应更新到名为DSCRpdScore的列。 / p>

此函数查看列号为CN的数据帧'df',并基于x返回适当的值。

myFUN = function(df, x, CN){
if (dtScoreLU[1,CN] <= median(dtScoreLU[,CN])){
    myMax = max(dtScoreLU[(dtScoreLU[,CN] <= x),CN])
    return(dtScoreLU %>% select(Score) %>% 
    filter(dtScoreLU[,CN] == myMax))
    } else {
    myMin = min(dtScoreLU[as.vector(dtScoreLU[,CN] >= x),CN])
    return(dtScoreLU %>% select(Score) %>% 
    filter(dtScoreLU[,CN] == myMin))
    } 
}

1 个答案:

答案 0 :(得分:0)

据我了解,这似乎是data.table的滚动连接功能的不错选择。

  

所以我希望能够基于df表找到ScoreLU值。   例如,DSCRpd中的1.3730682的值应返回ScoreLU   值60,因为它大于1.35但小于下一个   值1.65。

library(data.table)
ScoreLU[, .(DSCRpd, Score)][df, ,on = 'DSCRpd', roll = TRUE]

       DSCRpd Score Leverage         TCB
 1: 1.3730682    60 2.010122 -1590099.11
 2: 1.0449597    20 2.680051   493370.85
 3: 1.0311141    20 4.790531    21594.63
 4: 1.3923007    60 3.279903  -499326.76
 5: 1.6443938    60 3.853003   988780.79
 6: 0.6265976    NA 1.814359  1003736.73
 7: 2.1025253    80 4.412528  1245305.83
 8: 1.2872873    50 2.074424  -688305.83
 9: 0.5088294    NA 2.504510  1406986.68
10: 1.7794307    70 3.724905  1132513.33
  

另一方面,“杠杆”列需要按Desc顺序排列   即第一个值2.01应该按原样返回值60   小于2.5但大于下一个值2.0。

ScoreLU[, .(Leverage, Score)][df, , on = 'Leverage', roll = TRUE]

    Leverage Score    DSCRpd         TCB
 1: 2.010122    60 1.3730682 -1590099.11
 2: 2.680051    50 1.0449597   493370.85
 3: 4.790531    10 1.0311141    21594.63
 4: 3.279903    40 1.3923007  -499326.76
 5: 3.853003    30 1.6443938   988780.79
 6: 1.814359    70 0.6265976  1003736.73
 7: 4.412528    20 2.1025253  1245305.83
 8: 2.074424    60 1.2872873  -688305.83
 9: 2.504510    50 0.5088294  1406986.68
10: 3.724905    30 1.7794307  1132513.33

如果愿意,可以将它们组合在一起

ScoreLU[, .(Leverage, Score)][
  ScoreLU[, .(DSCRpd, Score)][
    df, ,on = 'DSCRpd', roll = TRUE
    ], , on = 'Leverage', roll = TRUE]

    Leverage Score    DSCRpd i.Score         TCB
 1: 2.010122    60 1.3730682      60 -1590099.11
 2: 2.680051    50 1.0449597      20   493370.85
 3: 4.790531    10 1.0311141      20    21594.63
 4: 3.279903    40 1.3923007      60  -499326.76
 5: 3.853003    30 1.6443938      60   988780.79
 6: 1.814359    70 0.6265976      NA  1003736.73
 7: 4.412528    20 2.1025253      80  1245305.83
 8: 2.074424    60 1.2872873      50  -688305.83
 9: 2.504510    50 0.5088294      NA  1406986.68
10: 3.724905    30 1.7794307      70  1132513.33

对于两个Score变量的末尾,可以根据需要指定rollends自变量。如果您有时间,我也会给?data.table做通读。这对入门很有帮助,因为语法有时可能有点不透明。

我对data.table还是陌生的,所以欢迎其他有更多专业知识的人加入。

数据

ScoreLU <- structure(list(Score = c(0L, 10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L, 90L),
                          DSCRpd = c(0.65, 0.8, 0.95, 1.1, 1.2, 1.26, 1.35, 1.65, 2, 2.3),
                          Leverage = c(5, 4.5, 4, 3.5, 3, 2.5, 2, 1.5, 1, 0.5),
                          TCB = c(0L, 100000L, 250000L, 500000L, 850000L, 1250000L, 1700000L, 2300000L, 2900000L, 3600000L)),
                     .Names = c("Score", "DSCRpd", "Leverage", "TCB"), row.names = c(NA, -10L), class = c("data.table", "data.frame"))

df <- structure(list(DSCRpd = c(1.3730682, 1.0449597, 1.0311141, 1.3923007, 1.6443938, 0.6265976, 2.1025253, 1.2872873, 0.5088294, 1.7794307),
                     Leverage = c(2.010122, 2.680051, 4.790531, 3.279903, 3.853003, 1.814359, 4.412528, 2.074424, 2.50451, 3.724905),
                     TCB = c(-1590099.11, 493370.85, 21594.63, -499326.76, 988780.79, 1003736.73, 1245305.83, -688305.83, 1406986.68, 1132513.33)),
                .Names = c("DSCRpd", "Leverage", "TCB"),
                row.names = c(NA, -10L), class = c("data.table", "data.frame" ))