我有2个数据表:
library(data.table)
dt1 <- data.table(id = 1:5, value1 = 11:15, value2 = 21:25, value3 = 36:40)
dt2 <- data.table(name = c("value1", "value1", "value1", "value1",
"value2", "value2", "value2", "value3", "value3"),
valueMin = c(10, 13, 14, 18, 21, 24, 25, 36, 38),
valueMax = c(13, 14, 18, 20, 24, 25, 27, 38, 42),
label = c(101:104, 201:203, 301:302))
> dt1
id value1 value2 value3
1: 1 11 21 36
2: 2 12 22 37
3: 3 13 23 38
4: 4 14 24 39
5: 5 15 25 40
> dt2
name valueMin valueMax label
1: value1 10 13 101
2: value1 13 14 102
3: value1 14 18 103
4: value1 18 20 104
5: value2 21 24 201
6: value2 24 25 202
7: value2 25 27 203
8: value3 36 38 301
9: value3 38 42 302
我期望的结果如下:通过dt2
中的dt1
在{{1}中的valueMin和valueMax之间的事实将标签从value1
到dt1
}和dt2
与dt2$name
匹配)。
这是我有的解决方案(给出正确的结果):
value1
我想对varName <- "value1"
dt2_temp <- dt2[name == varName,]
dt1[dt2_temp, on = .(value1 > valueMin, value1 <= valueMax), nomatch = 0] %>%
select(id, label)
id label
1: 1 101
2: 2 101
3: 3 101
4: 4 102
5: 5 103
中的所有其余列(label
,value2
)(使用循环)执行相同的操作(获取value3
列,因此需要替换对存储在dt1
中的列名value1
的引用,类似于:
varName
不幸的是,我没有成功使用:仅使用dt1[dt2_temp, on = .(varName > valueMin, varName <= valueMax), nomatch = 0]
,varName
,eval(varName)
。您有解决办法的想法吗?
错误消息类似于:
as.name(varName)
答案 0 :(得分:10)
为什么不一without而就呢?
可能的解决方案:
melt(dt1, id = 1)[dt2, on = .(variable = name, value > valueMin, value <= valueMax), lbl := i.label
][, dcast(.SD, id ~ variable, value.var = c("value","lbl"))]
给出:
id value_value1 value_value2 value_value3 lbl_value1 lbl_value2 lbl_value3 1: 1 11 21 36 101 NA NA 2: 2 12 22 37 101 201 301 3: 3 13 23 38 101 201 301 4: 4 14 24 39 102 201 302 5: 5 15 25 40 103 202 302
答案 1 :(得分:2)
melt(dt1,1)[dt2, on = .(value> valueMin, value <= valueMax,variable=name), nomatch = 0]
id variable value value.1 label
1: 1 value1 10 13 101
2: 2 value1 10 13 101
3: 3 value1 10 13 101
4: 4 value1 13 14 102
5: 5 value1 14 18 103
6: 2 value2 21 24 201
7: 3 value2 21 24 201
8: 4 value2 21 24 201
9: 5 value2 24 25 202
10: 2 value3 36 38 301
11: 3 value3 36 38 301
12: 4 value3 38 42 302
13: 5 value3 38 42 302
答案 2 :(得分:2)
其中一种方法可能是
library(data.table)
dcast(dt2[melt(dt1, id.vars = 1), #left join of long form of dt1 and original dt2
.( id, variable, label), #only keep concerned columns from merged table
on = .(name = variable, valueMax >= value, valueMin < value)], #join conditions
id ~ variable,
value.var = "label") #long to wide format using dcast to get the final result
给出
id value1 value2 value3
1: 1 101 NA NA
2: 2 101 201 301
3: 3 101 201 301
4: 4 102 201 302
5: 5 103 202 302
答案 3 :(得分:2)
发布另一种以编程方式构造var bulk = db.collection.initializeUnorderedBulkOp();
bulk.find({"identifier": {$exists:true}}).update(
function(x) {
{_id: x._id}, {$set: {"identifier": x.identifier.toString()}}
});
bulk.execute();
字符串的方法(请参阅on
中的on
参数)
?data.table
请注意,变量名称周围不应有任何空格。
答案 4 :(得分:0)
混合使用tidyverse
和fuzzyjoin
:
library(tidyverse)
library(fuzzyjoin)
dt2 %>% fuzzy_inner_join(
gather(dt1,name, value,-1),
by=c("name",valueMin="value",valueMax="value"),
list(function(x,y) x == y,
function(x,y) x < y,
function(x,y) x >= y)) %>%
select(id,name.x,label) %>%
distinct %>%
spread(name.x,label)
# id value1 value2 value3
# 1: 1 101 NA NA
# 2: 2 101 201 301
# 3: 3 101 201 301
# 4: 4 102 201 302
# 5: 5 103 202 302