根据列值合并不同的数据帧

时间:2018-07-31 21:00:55

标签: r dataframe merge

我有一个数据帧df1

df1<- data.frame(ID = c("A","B","A","A","B"),CLASS = c(1,1,2,1,4))
 ID CLASS
1  A     1
2  B     1
3  A     2
4  A     1
5  B     4

以及另外两个数据帧AB

   > A<- data.frame(CLASS = c(1,2,3), DESCRIPTION = c("Unknown", "Tall", "Short"))
  CLASS DESCRIPTION
1     1     Unknown
2     2        Tall
3     3       Short

> B <- data.frame(CLASS = c(1,2,3,4), DESCRIPTION = c("Big", "Small", "Medium", "Very Big"))
  CLASS DESCRIPTION
1     1         Big
2     2       Small
3     3      Medium
4     4    Very Big

我想根据df1的ID和类来合并这三个数据帧,使其具有以下内容:

      ID CLASS DESCRIPTION
1  A     1     Unknown
2  B     1         Big
3  A     2        Tall
4  A     1     Unknown
5  B     4    Very Big

我知道我可以将其合并为df1 <- merge(df1, A, by = "CLASS"),但是我找不到找到添加条件(也可能是“如果”太多的方法)以根据ID合并B的方法。 我需要一种有效的方法来完成此操作,因为我将其应用于超过200万行。

3 个答案:

答案 0 :(得分:2)

ID变量添加到ABrbind AB中,并使用IDCLASSmerge

A$ID = 'A'
B$ID = 'B'

AB <- rbind(A, B)

merge(df1, AB, by = c('ID', 'CLASS'))

  ID CLASS DESCRIPTION
1  A     1     Unknown
2  A     1     Unknown
3  A     2        Tall
4  B     1         Big
5  B     4    Very Big

我建议在创建数据时使用stringsAsFactors = FALSE

df1 <- data.frame(ID = c("A","B","A","A","B"),CLASS = c(1,1,2,1,4),
                  stringsAsFactors = FALSE)
A <- data.frame(CLASS = c(1,2,3), 
                DESCRIPTION = c("Unknown", "Tall", "Short"),
                stringsAsFactors = FALSE)
B <- data.frame(CLASS = c(1,2,3,4), 
                DESCRIPTION = c("Big", "Small", "Medium", "Very Big"),
                stringsAsFactors = FALSE)

答案 1 :(得分:1)

要一次性合并多个数据框,EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0' , N'AllowInProcess', 1 GO EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0' , N'DynamicParameters', 1 GO 通常很有帮助:

Reduce

如您所见,所有数据框中存在的列都添加了后缀(默认为out <- Reduce(function(x,y) merge(x,y, by = "CLASS", all.x=T), list(df1, A, B)) out CLASS ID DESCRIPTION.x DESCRIPTION.y 1 1 A Unknown Big 2 1 B Unknown Big 3 1 A Unknown Big 4 2 A Tall Small 5 4 B <NA> Very Big 行为)。这使您可以应用任何逻辑,以获得所需的最后一列。例如,

merge

请注意,out$Description <- ifelse(out$ID == "A", as.character(out$DESCRIPTION.x), as.character(out$DESCRIPTION.y)) > out CLASS ID DESCRIPTION.x DESCRIPTION.y Description 1 1 A Unknown Big Unknown 2 1 B Unknown Big Big 3 1 A Unknown Big Unknown 4 2 A Tall Small Tall 5 4 B <NA> Very Big Very Big 是矢量化的并且非常有效。

答案 2 :(得分:1)

一种dplyr解决方案:

library(dplyr)
bind_rows(lst(A,B),.id="ID") %>% inner_join(df1)
#   ID CLASS DESCRIPTION
# 1  A     1     Unknown
# 2  A     1     Unknown
# 3  A     2        Tall
# 4  B     1         Big
# 5  B     4    Very Big