替换匹配项中的值

时间:2018-12-13 11:45:38

标签: r dataframe match

我是R编程的新手,并且坚持下面的示例。

基本上我有两个数据集:

数据集1:

ID       Category        
1        CatZZ         
2        CatVV         
3        CatAA  
4        CatQQ

数据集2:

ID  Category  
1   Cat600  
3   Cat611 

我试图用数据集2中的“类别”值覆盖数据集1中的“类别”值,其中两个数据集之间存在ID匹配。

所以结果看起来像这样:

数据集1:

ID  Category    
1   Cat600  
2   CatVV  
3   Cat611  
4   CatQQ  

5 个答案:

答案 0 :(得分:3)

tidyverse中,您可以执行以下操作:

df1 %>%
 left_join(df2, by = c("ID" = "ID")) %>% #Merging the two dfs on ID
 mutate(Category = if_else(!is.na(Category.y), Category.y, Category.x)) %>% #If there is a match, taking the value from df2, otherwise from df1
 select(ID, Category) #Deleting the redundant variables

  ID Category
1  1   Cat600
2  2    CatVV
3  3   Cat611
4  4    CatQQ

或者:

df1 %>%
 left_join(df2, by = c("ID" = "ID")) %>% #Merging the two dfs on ID
 gather(var, val, -ID) %>% #Transforming the data from wide to long format
 arrange(ID) %>% #Arranging by ID
 group_by(ID) %>% #Grouping by ID
 mutate(Category = if_else(!is.na(nth(val, 2)), nth(val, 2), first(val))) %>% #If non-NA, taking the value from df2, otherwise from df1
 spread(var, val) %>% #Returning the data to wide format
 select(ID, Category) #Removing the redundant variables 

     ID Category
  <int> <chr>   
1     1 Cat600  
2     2 CatVV   
3     3 Cat611  
4     4 CatQQ

样本数据:

df1 <- read.table(text = "ID       Category        
1        CatZZ         
2        CatVV         
3        CatAA  
4        CatQQ", header = TRUE, stringsAsFactors = FALSE)

df2 <- read.table(text = "ID  Category  
1   Cat600  
                  3   Cat611", header = TRUE, stringsAsFactors = FALSE)

答案 1 :(得分:2)

另一种选择是使用data.table软件包。

在答案中使用与@tmfmnk相同的设置:

构建示例数据集:

df1 <- read.table(text = "ID       Category        
1        CatZZ         
2        CatVV         
3        CatAA  
4        CatQQ", header = TRUE, stringsAsFactors = FALSE)

df2 <- read.table(text = "ID  Category  
1   Cat600  
                  3   Cat611", header = TRUE, stringsAsFactors = FALSE)

加载data.table包并将数据帧转换为数据表:

library(data.table)
df1 <- data.table(df1)
df2 <- data.table(df2)

执行左联接

(从df1中获取所有值,其中ID与df2匹配,然后在其中添加df2中的类别,然后创建一个新列,结合df1和df2中的信息)

a <- df2[df1, on = "ID"][, a := ifelse(is.na(Category), i.Category, Category)]

data上有一个很好的问答。表在这里连接:Left join using data.table

此外,要获得所需的确切结果,您可以执行以下操作:

a <- df2[df1, on = "ID"][, list(ID, Category = ifelse(is.na(Category), i.Category, Category))]

答案 2 :(得分:2)

结合使用基本R match函数和data.table的set函数:

matchinds = na.omit(match(dataset1$ID,dataset2$ID)) # this will give index of dataset2$ID where values of dataset1$ID were equal to values of dataset2$ID

set(x=dataset1,i=matchinds,j="Category",value=dataset2$category[matchinds])  #this will set values at matching indexes in dataset1 Category column equal to Category column matching index values in dataset2

答案 3 :(得分:2)

使用base R

match方法
df1$Category[match(df2$ID, df1$ID)] <- df2$Category
df1
#  ID Category
#1  1   Cat600
#2  2    CatVV
#3  3   Cat611
#4  4    CatQQ

数据

df1 <- structure(list(ID = 1:4, Category = c("CatZZ", "CatVV", "CatAA", 
"CatQQ")), .Names = c("ID", "Category"), class = "data.frame", row.names = c(NA, 
-4L))

df2 <- structure(list(ID = c(1L, 3L), Category = c("Cat600", "Cat611"
)), .Names = c("ID", "Category"), class = "data.frame", row.names = c(NA, 
-2L))

答案 4 :(得分:1)

您可以将df2堆叠在df1之上,并为每个ID保留第一个实例。

tidyverse 为:

library(tidyverse)
bind_rows(df2,df1) %>%
  group_by(ID) %>%
  slice(1) %>%
  ungroup()

# # A tibble: 4 x 2
#      ID Category
#   <int>    <chr>
# 1     1   Cat600
# 2     2    CatVV
# 3     3   Cat611
# 4     4    CatQQ

或基本版本(对行进行重新排序):

subset(rbind(df2,df1), !duplicated(ID))
#   ID Category
# 1  1   Cat600
# 2  3   Cat611
# 4  2    CatVV
# 6  4    CatQQ