我想将以下示例中的第一个df替换为基于ID列的数据帧2中的行。 例如:假设人X在数据帧1上有100个项目,但是当我们查看数据帧2时,我们看到他实际上只有50个项目,而其他50个项目用于Z,所以在最终结果中,我们应该有一行具有50个项目的人X和具有50个项目的人Z的另一行,两者具有相同的ID。
数据框1
ID Name Status Items
16 Amy B Closed 100
10 Erik C Closed 80
14 Paul R Closed 20
17 Chris K Closed 40
19 Ali I Closed 60
22 Jenny A Closed 40
数据框2
ID Name Items
14 Paul R 10
14 Sarah K 10
22 Jenny A 30
22 Brian L 10
结果
ID Name Status Items
16 Amy B Closed 100
10 Erik C Closed 80
14 Paul R Closed 10
14 Sarah K Closed 10
17 Chris K Closed 40
19 Ali I Closed 60
22 Jenny A Closed 30
22 Brian L Closed 10
答案 0 :(得分:1)
看起来你在这里进行了一些合并,并优先考虑"项目"在数据框2中。
请尝试使用dplyr
包和left_join()
以及full_join()
的代码。
加载数据......
df1 <- read.table(header=TRUE, stringsAsFactors = FALSE, text=
'ID Name Status Items
16 Amy_B Closed 100
10 Erik_C Closed 80
14 Paul_R Closed 20
17 Chris_K Closed 40
19 Ali_I Closed 60
22 Jenny_A Closed 40')
df2 <- read.table(header = TRUE, stringsAsFactors = FALSE, text =
"ID Name Items
14 Paul_R 10
14 Sarah_K 10
22 Jenny_A 30
22 Brian_L 10")
合并表格
# add the status column to df2
df <- left_join(df2, df1 %>% select(ID, Status), by = 'ID')
# ID Name Items Status
# 14 Paul_R 10 Closed
# 14 Sarah_K 10 Closed
# 22 Jenny_A 30 Closed
# 22 Brian_L 10 Closed
# combine both data frames by merging for both ID and Name
df <- full_join(df, df1,
by = c('ID', 'Name', 'Status'),
suffix = c('.1', '.2'))
# ID Name Items.1 Status Items.2
# 14 Paul_R 10 Closed 20
# 14 Sarah_K 10 Closed NA
# 22 Jenny_A 30 Closed 40
# 22 Brian_L 10 Closed NA
# 16 Amy_B NA Closed 100
# 10 Erik_C NA Closed 80
# 17 Chris_K NA Closed 40
# 19 Ali_I NA Closed 60
# create a new column which selects the df2 value if that exists, otherwise uses df1 value
df <- df %>%
mutate(Items = ifelse(is.na(Items.1), Items.2, Items.1)) %>%
select(-Items.1, -Items.2)
# ID Name Status Items
# 14 Paul_R Closed 10
# 14 Sarah_K Closed 10
# 22 Jenny_A Closed 30
# 22 Brian_L Closed 10
# 16 Amy_B Closed 100
# 10 Erik_C Closed 80
# 17 Chris_K Closed 40
# 19 Ali_I Closed 60
全部放在一起......
left_join(df2, df1 %>% select(ID, Status), by = 'ID') %>%
full_join(df1,
by = c('ID', 'Name', 'Status'),
suffix = c('.1', '.2')) %>%
mutate(Items = ifelse(is.na(Items.1), Items.2, Items.)) %>%
select(-Items.1, -Items.2)
将下表作为输出:
ID Name Status Items
14 Paul_R Closed 10
14 Sarah_K Closed 10
22 Jenny_A Closed 30
22 Brian_L Closed 10
16 Amy_B Closed 100
10 Erik_C Closed 80
17 Chris_K Closed 40
19 Ali_I Closed 60
答案 1 :(得分:0)
假设您的实际数据与样本数据一样规则,您有冗余信息,重要信息是:
df1
df2
df3
我们所做的是首先将Status
信息添加到df2(merge(df2,df1[c(1,3)])
),然后我们rbind
df1
和df2
的相关项目信息}。
rbind(df1[!df1$ID%in% df2$ID,],merge(df2,df1[c(1,3)]))
# ID Name Status Items
# 1 16 Amy B Closed 100
# 2 10 Erik C Closed 80
# 4 17 Chris K Closed 40
# 5 19 Ali I Closed 60
# 11 14 Paul R Closed 10
# 21 14 Sarah K Closed 10
# 3 22 Jenny A Closed 30
# 41 22 Brian L Closed 10
数据强>
df1 <- read.table(text="ID Name Status Items
16 'Amy B ' Closed 100
10 'Erik C ' Closed 80
14 'Paul R ' Closed 20
17 'Chris K' Closed 40
19 'Ali I ' Closed 60
22 'Jenny A' Closed 40",h=T,strin=F)
df2<- read.table(text="ID Name Items
14 'Paul R ' 10
14 'Sarah K' 10
22 'Jenny A' 30
22 'Brian L' 10",h=T,strin=F)