假设您有
之类的数据fruits <- data.table(FruitID=c(1,2,3), Fruit=c("Apple", "Banana", "Strawberry"))
colors <- data.table(ColorID=c(1,2,3,4,5), FruitID=c(1,1,1,2,3), Color=c("Red","Yellow","Green","Yellow","Red"))
tastes <- data.table(TasteID=c(1,2,3), FruitID=c(1,1,3), Taste=c("Sweeet", "Sour", "Sweet"))
setkey(fruits, "FruitID")
setkey(colors, "ColorID")
setkey(tastes, "TasteID")
fruits
FruitID Fruit
1: 1 Apple
2: 2 Banana
3: 3 Strawberry
colors
ColorID FruitID Color
1: 1 1 Red
2: 2 1 Yellow
3: 3 1 Green
4: 4 2 Yellow
5: 5 3 Red
tastes
TasteID FruitID Taste
1: 1 1 Sweeet
2: 2 1 Sour
3: 3 3 Sweet
我通常需要对这样的数据执行左外连接。例如,&#34;给我所有的水果和颜色&#34;要求我写(也许还有更好的方法吗?)
setkey(colors, "FruitID")
result <- colors[fruits, allow.cartesian=TRUE]
setkey(colors, "ColorID")
这么简单而频繁的任务的三行代码似乎过多,所以我写了一个方法myLeftJoin
myLeftJoin <- function(tbl1, tbl2){
# Performs a left join using the key in tbl1 (i.e. keeps all rows from tbl1 and only matching rows from tbl2)
oldkey <- key(tbl2)
setkeyv(tbl2, key(tbl1))
result <- tbl2[tbl1, allow.cartesian=TRUE]
setkeyv(tbl2, oldkey)
return(result)
}
我可以使用
myLeftJoin(fruits, colors)
ColorID FruitID Color Fruit
1: 1 1 Red Apple
2: 2 1 Yellow Apple
3: 3 1 Green Apple
4: 4 2 Yellow Banana
5: 5 3 Red Strawberry
如何扩展此方法以便我可以将任意数量的表传递给它并获得所有这些表的链式左外连接?像myLeftJoin(tbl1, ...)
例如,我希望myleftJoin(fruits, colors, tastes)
的结果等同于
setkey(colors, "FruitID")
setkey(tastes, "FruitID")
result <- tastes[colors[fruits, allow.cartesian=TRUE], allow.cartesian=TRUE]
setkey(tastes, "TasteID")
setkey(colors, "ColorID")
result
TasteID FruitID Taste ColorID Color Fruit
1: 1 1 Sweeet 1 Red Apple
2: 2 1 Sour 1 Red Apple
3: 1 1 Sweeet 2 Yellow Apple
4: 2 1 Sour 2 Yellow Apple
5: 1 1 Sweeet 3 Green Apple
6: 2 1 Sour 3 Green Apple
7: NA 2 NA 4 Yellow Banana
8: 3 3 Sweet 5 Red Strawberry
也许有一个优雅的解决方案,使用我错过的data.table包中的方法?感谢
(编辑:修正了我的数据中的错误)
答案 0 :(得分:9)
我刚刚在data.table, v1.9.5
中提交了一项新功能,我们可以在不设置密钥的情况下加入该功能(即,直接指定要加入的列,而不必先使用require(data.table) # v1.9.5+
fruits[tastes, on="FruitID"][colors, on="FruitID"] # no setkey required
# FruitID Fruit TasteID Taste ColorID Color
# 1: 1 Apple 1 Sweeet 1 Red
# 2: 1 Apple 2 Sour 1 Red
# 3: 1 Apple 1 Sweeet 2 Yellow
# 4: 1 Apple 2 Sour 2 Yellow
# 5: 1 Apple 1 Sweeet 3 Green
# 6: 1 Apple 2 Sour 3 Green
# 7: 2 NA NA NA 4 Yellow
# 8: 3 Strawberry 3 Sweet 5 Red
):
有了这个,这很简单:
grid:::absolute.units.unit.arithmetic(u)
答案 1 :(得分:6)
您可以同时使用基础R RewriteRule .* - [F]
到Reduce
(来自left_join
) dplyr
个对象的列表您正在使用常用列名加入表格,并且 愿意避免多次为data.table
对象设置keys
data.table
另一种选择纯数据。表格为@Frank提到
(注意,这需要将所有library(data.table) # <= v1.9.4
library(dplyr) # left_join
Reduce(function(...) left_join(...), list(fruits,colors,tastes))
# Source: local data table [8 x 6]
# FruitID Fruit ColorID Color TasteID Taste
#1 1 Apple 1 Red 1 Sweeet
#2 1 Apple 1 Red 2 Sour
#3 1 Apple 2 Yellow 1 Sweeet
#4 1 Apple 2 Yellow 2 Sour
#5 1 Apple 3 Green 1 Sweeet
#6 1 Apple 3 Green 2 Sour
#7 2 Banana 4 Yellow NA NA
#8 3 Strawberry 5 Red 3 Sweet
个对象的密钥设置为fruitID
data.table