我有一个数据帧df1
df1<- data.frame(ID = c("A","B","A","A","B"),CLASS = c(1,1,2,1,4))
ID CLASS
1 A 1
2 B 1
3 A 2
4 A 1
5 B 4
以及另外两个数据帧A
和B
> A<- data.frame(CLASS = c(1,2,3), DESCRIPTION = c("Unknown", "Tall", "Short"))
CLASS DESCRIPTION
1 1 Unknown
2 2 Tall
3 3 Short
> B <- data.frame(CLASS = c(1,2,3,4), DESCRIPTION = c("Big", "Small", "Medium", "Very Big"))
CLASS DESCRIPTION
1 1 Big
2 2 Small
3 3 Medium
4 4 Very Big
我想根据df1
的ID和类来合并这三个数据帧,使其具有以下内容:
ID CLASS DESCRIPTION
1 A 1 Unknown
2 B 1 Big
3 A 2 Tall
4 A 1 Unknown
5 B 4 Very Big
我知道我可以将其合并为df1 <- merge(df1, A, by = "CLASS")
,但是我找不到找到添加条件(也可能是“如果”太多的方法)以根据ID合并B的方法。
我需要一种有效的方法来完成此操作,因为我将其应用于超过200万行。
答案 0 :(得分:2)
将ID
变量添加到A
和B
,rbind
A
和B
中,并使用ID
和CLASS
至merge
:
A$ID = 'A'
B$ID = 'B'
AB <- rbind(A, B)
merge(df1, AB, by = c('ID', 'CLASS'))
ID CLASS DESCRIPTION
1 A 1 Unknown
2 A 1 Unknown
3 A 2 Tall
4 B 1 Big
5 B 4 Very Big
我建议在创建数据时使用stringsAsFactors = FALSE
:
df1 <- data.frame(ID = c("A","B","A","A","B"),CLASS = c(1,1,2,1,4),
stringsAsFactors = FALSE)
A <- data.frame(CLASS = c(1,2,3),
DESCRIPTION = c("Unknown", "Tall", "Short"),
stringsAsFactors = FALSE)
B <- data.frame(CLASS = c(1,2,3,4),
DESCRIPTION = c("Big", "Small", "Medium", "Very Big"),
stringsAsFactors = FALSE)
答案 1 :(得分:1)
要一次性合并多个数据框,EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0'
, N'AllowInProcess', 1
GO
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0'
, N'DynamicParameters', 1
GO
通常很有帮助:
Reduce
如您所见,所有数据框中存在的列都添加了后缀(默认为out <- Reduce(function(x,y) merge(x,y, by = "CLASS", all.x=T), list(df1, A, B))
out
CLASS ID DESCRIPTION.x DESCRIPTION.y
1 1 A Unknown Big
2 1 B Unknown Big
3 1 A Unknown Big
4 2 A Tall Small
5 4 B <NA> Very Big
行为)。这使您可以应用任何逻辑,以获得所需的最后一列。例如,
merge
请注意,out$Description <- ifelse(out$ID == "A", as.character(out$DESCRIPTION.x), as.character(out$DESCRIPTION.y))
> out
CLASS ID DESCRIPTION.x DESCRIPTION.y Description
1 1 A Unknown Big Unknown
2 1 B Unknown Big Big
3 1 A Unknown Big Unknown
4 2 A Tall Small Tall
5 4 B <NA> Very Big Very Big
是矢量化的并且非常有效。
答案 2 :(得分:1)
一种dplyr
解决方案:
library(dplyr)
bind_rows(lst(A,B),.id="ID") %>% inner_join(df1)
# ID CLASS DESCRIPTION
# 1 A 1 Unknown
# 2 A 1 Unknown
# 3 A 2 Tall
# 4 B 1 Big
# 5 B 4 Very Big