使用存储在变量中的列名联接数据表

时间:2018-07-05 08:31:53

标签: r join data.table

我有2个数据表:

library(data.table)
dt1 <- data.table(id = 1:5, value1 = 11:15, value2 = 21:25, value3 = 36:40)
dt2 <- data.table(name = c("value1", "value1", "value1", "value1", 
                            "value2", "value2", "value2", "value3", "value3"), 
              valueMin = c(10, 13, 14, 18, 21, 24, 25, 36, 38), 
              valueMax = c(13, 14, 18, 20, 24, 25, 27, 38, 42), 
              label = c(101:104, 201:203, 301:302))
> dt1
   id value1 value2 value3
1:  1     11     21     36
2:  2     12     22     37
3:  3     13     23     38
4:  4     14     24     39
5:  5     15     25     40
> dt2
     name valueMin valueMax label
1: value1       10       13   101
2: value1       13       14   102
3: value1       14       18   103
4: value1       18       20   104
5: value2       21       24   201
6: value2       24       25   202
7: value2       25       27   203
8: value3       36       38   301
9: value3       38       42   302

我期望的结果如下:通过dt2中的dt1在{{1}中的valueMin和valueMax之间的事实将标签从value1dt1 }和dt2dt2$name匹配)。 这是我有的解决方案(给出正确的结果):

value1

我想对varName <- "value1" dt2_temp <- dt2[name == varName,] dt1[dt2_temp, on = .(value1 > valueMin, value1 <= valueMax), nomatch = 0] %>% select(id, label) id label 1: 1 101 2: 2 101 3: 3 101 4: 4 102 5: 5 103 中的所有其余列(labelvalue2)(使用循环)执行相同的操作(获取value3列,因此需要替换对存储在dt1中的列名value1的引用,类似于:

varName

不幸的是,我没有成功使用:仅使用dt1[dt2_temp, on = .(varName > valueMin, varName <= valueMax), nomatch = 0] varNameeval(varName)。您有解决办法的想法吗?

错误消息类似于:

as.name(varName)

5 个答案:

答案 0 :(得分:10)

为什么不一without而就呢?

可能的解决方案:

melt(dt1, id = 1)[dt2, on = .(variable = name, value > valueMin, value <= valueMax), lbl := i.label
                  ][, dcast(.SD, id ~ variable, value.var = c("value","lbl"))]

给出:

   id value_value1 value_value2 value_value3 lbl_value1 lbl_value2 lbl_value3
1:  1           11           21           36        101         NA         NA
2:  2           12           22           37        101        201        301
3:  3           13           23           38        101        201        301
4:  4           14           24           39        102        201        302
5:  5           15           25           40        103        202        302

答案 1 :(得分:2)

melt(dt1,1)[dt2, on = .(value> valueMin, value <= valueMax,variable=name), nomatch = 0]

   id variable value value.1 label
 1:  1   value1    10      13   101
 2:  2   value1    10      13   101
 3:  3   value1    10      13   101
 4:  4   value1    13      14   102
 5:  5   value1    14      18   103
 6:  2   value2    21      24   201
 7:  3   value2    21      24   201
 8:  4   value2    21      24   201
 9:  5   value2    24      25   202
10:  2   value3    36      38   301
11:  3   value3    36      38   301
12:  4   value3    38      42   302
13:  5   value3    38      42   302

答案 2 :(得分:2)

其中一种方法可能是

library(data.table)

dcast(dt2[melt(dt1, id.vars = 1),    #left join of long form of dt1 and original dt2
          .( id, variable, label),   #only keep concerned columns from merged table
          on = .(name = variable,  valueMax >= value, valueMin < value)],  #join conditions
      id ~ variable, 
      value.var = "label")           #long to wide format using dcast to get the final result

给出

   id value1 value2 value3
1:  1    101     NA     NA
2:  2    101    201    301
3:  3    101    201    301
4:  4    102    201    302
5:  5    103    202    302

答案 3 :(得分:2)

发布另一种以编程方式构造var bulk = db.collection.initializeUnorderedBulkOp(); bulk.find({"identifier": {$exists:true}}).update( function(x) { {_id: x._id}, {$set: {"identifier": x.identifier.toString()}} }); bulk.execute(); 字符串的方法(请参阅on中的on参数)

?data.table

请注意,变量名称周围不应有任何空格。

答案 4 :(得分:0)

混合使用tidyversefuzzyjoin

library(tidyverse)
library(fuzzyjoin)

dt2 %>% fuzzy_inner_join(
  gather(dt1,name, value,-1),
  by=c("name",valueMin="value",valueMax="value"),
  list(function(x,y) x == y,
       function(x,y) x < y,
       function(x,y) x >= y)) %>%
  select(id,name.x,label) %>%
  distinct %>%
  spread(name.x,label)

#    id value1 value2 value3
# 1:  1    101     NA     NA
# 2:  2    101    201    301
# 3:  3    101    201    301
# 4:  4    102    201    302
# 5:  5    103    202    302