我是R编程的新手,并且坚持下面的示例。
基本上我有两个数据集:
数据集1:
ID Category
1 CatZZ
2 CatVV
3 CatAA
4 CatQQ
数据集2:
ID Category
1 Cat600
3 Cat611
我试图用数据集2中的“类别”值覆盖数据集1中的“类别”值,其中两个数据集之间存在ID匹配。
所以结果看起来像这样:
数据集1:
ID Category
1 Cat600
2 CatVV
3 Cat611
4 CatQQ
答案 0 :(得分:3)
在tidyverse
中,您可以执行以下操作:
df1 %>%
left_join(df2, by = c("ID" = "ID")) %>% #Merging the two dfs on ID
mutate(Category = if_else(!is.na(Category.y), Category.y, Category.x)) %>% #If there is a match, taking the value from df2, otherwise from df1
select(ID, Category) #Deleting the redundant variables
ID Category
1 1 Cat600
2 2 CatVV
3 3 Cat611
4 4 CatQQ
或者:
df1 %>%
left_join(df2, by = c("ID" = "ID")) %>% #Merging the two dfs on ID
gather(var, val, -ID) %>% #Transforming the data from wide to long format
arrange(ID) %>% #Arranging by ID
group_by(ID) %>% #Grouping by ID
mutate(Category = if_else(!is.na(nth(val, 2)), nth(val, 2), first(val))) %>% #If non-NA, taking the value from df2, otherwise from df1
spread(var, val) %>% #Returning the data to wide format
select(ID, Category) #Removing the redundant variables
ID Category
<int> <chr>
1 1 Cat600
2 2 CatVV
3 3 Cat611
4 4 CatQQ
样本数据:
df1 <- read.table(text = "ID Category
1 CatZZ
2 CatVV
3 CatAA
4 CatQQ", header = TRUE, stringsAsFactors = FALSE)
df2 <- read.table(text = "ID Category
1 Cat600
3 Cat611", header = TRUE, stringsAsFactors = FALSE)
答案 1 :(得分:2)
另一种选择是使用data.table
软件包。
在答案中使用与@tmfmnk相同的设置:
构建示例数据集:
df1 <- read.table(text = "ID Category
1 CatZZ
2 CatVV
3 CatAA
4 CatQQ", header = TRUE, stringsAsFactors = FALSE)
df2 <- read.table(text = "ID Category
1 Cat600
3 Cat611", header = TRUE, stringsAsFactors = FALSE)
加载data.table
包并将数据帧转换为数据表:
library(data.table)
df1 <- data.table(df1)
df2 <- data.table(df2)
执行左联接
(从df1中获取所有值,其中ID与df2匹配,然后在其中添加df2中的类别,然后创建一个新列,结合df1和df2中的信息)
a <- df2[df1, on = "ID"][, a := ifelse(is.na(Category), i.Category, Category)]
data上有一个很好的问答。表在这里连接:Left join using data.table
此外,要获得所需的确切结果,您可以执行以下操作:
a <- df2[df1, on = "ID"][, list(ID, Category = ifelse(is.na(Category), i.Category, Category))]
答案 2 :(得分:2)
结合使用基本R match
函数和data.table的set
函数:
matchinds = na.omit(match(dataset1$ID,dataset2$ID)) # this will give index of dataset2$ID where values of dataset1$ID were equal to values of dataset2$ID
set(x=dataset1,i=matchinds,j="Category",value=dataset2$category[matchinds]) #this will set values at matching indexes in dataset1 Category column equal to Category column matching index values in dataset2
答案 3 :(得分:2)
使用base R
match
方法
df1$Category[match(df2$ID, df1$ID)] <- df2$Category
df1
# ID Category
#1 1 Cat600
#2 2 CatVV
#3 3 Cat611
#4 4 CatQQ
数据
df1 <- structure(list(ID = 1:4, Category = c("CatZZ", "CatVV", "CatAA",
"CatQQ")), .Names = c("ID", "Category"), class = "data.frame", row.names = c(NA,
-4L))
df2 <- structure(list(ID = c(1L, 3L), Category = c("Cat600", "Cat611"
)), .Names = c("ID", "Category"), class = "data.frame", row.names = c(NA,
-2L))
答案 4 :(得分:1)
您可以将df2
堆叠在df1
之上,并为每个ID
保留第一个实例。
tidyverse
为:
library(tidyverse)
bind_rows(df2,df1) %>%
group_by(ID) %>%
slice(1) %>%
ungroup()
# # A tibble: 4 x 2
# ID Category
# <int> <chr>
# 1 1 Cat600
# 2 2 CatVV
# 3 3 Cat611
# 4 4 CatQQ
或基本版本(对行进行重新排序):
subset(rbind(df2,df1), !duplicated(ID))
# ID Category
# 1 1 Cat600
# 2 3 Cat611
# 4 2 CatVV
# 6 4 CatQQ