如何使用“merge,by = Column.name”函数合并具有不同名称的列?

时间:2017-10-15 13:22:55

标签: r merge rstudio

#Data1

SampleID <- c("A-01","B-01","C-01")
Value <- c(1,2,3)
data1 <- data.frame(SampleID, Value)

#Data2

SampleID <- c("A","B","C")
Value1 <- c(3,4,5)
data2 <- data.frame(SampleID,Value1)

#输出:我想要的是以下使用:     merge(data1, data2, by=c("SampleID"), all = TRUE)

SampleID  Value  Value1
A-01        1       3
B-01        2       4
C-01        3       5

3 个答案:

答案 0 :(得分:1)

您可以使用sqldf库:

library(sqldf); 
sqldf("SELECT data1.SampledId, data1.Vlaue, data2.Value2 FROM data1 JOIN data2 on data1.SampleID like data1.SampleID + '-%'")

或使用data.table喜欢以下内容:

library(data.table) 
dt1 <- data.table(data1)
dt2 <- data.table(data2)
dt1[dt2, on = .(grepl(CustomerId, CustomerId)), all = TRUE]

答案 1 :(得分:1)

我相信以下是你所需要的。

data1$NewID <- gsub("[^[:alpha:]]", "", data1$SampleID)
result <- merge(data1, data2, by.x = "NewID", by.y = "SampleID", all = TRUE)
result <- result[-1]
result
#  SampleID Value Value1
#1     A-01     1      3
#2     B-01     2      4
#3     C-01     3      5

然后,您可以使用

data1中删除多余的列
data1 <- data1[-3]

答案 2 :(得分:1)

要添加到集合,这里有一个dplyr解决方案,它更容易阅读:

options(stringsAsFactors = F)
SampleID <-c("A-01","B-01","C-01")
Value <- c(1,2,3)
data1 <- data.frame(SampleID, Value)

SampleID <- c("A","B","C")
Value1 <- c(3,4,5)
data2 <- data.frame(SampleID,Value1)

data1 %>% 
  mutate(new_id = gsub("[^[:alpha:]]", "", SampleID)) %>% 
  left_join(., data2, by = c("new_id" = "SampleID")) %>% 
  select(-new_id)

  SampleID Value Value1
1     A-01     1      3
2     B-01     2      4
3     C-01     3      5