我有一个看起来像这样的数据框
n = c(1, 1, 2, 4, 4, 5, 5, 7)
s = c("aa", "aa", "bb", "dd", "dd", "ee", "ee", "gg")
b = c("Feb", "Jan", "Mar", "Dec", "Mar", "Apr", "Jan", "Aug")
df = data.frame(n, s, b)
view(df)
n s b
1 1 aa Feb
2 1 aa Jan
3 2 bb Mar
4 4 dd Dec
5 4 dd Mar
6 5 ee Apr
7 5 ee Jan
8 7 gg Aug
我有一个.csv的参考表,我可以拉入R,我使用read.csv。它看起来像这样:
view(csv)
a s
1 1 aa
2 2 bb
3 3 cc
4 4 dd
5 5 ee
6 6 ff
7 7 gg
我想使用csv作为参考表来添加缺少的东西,例如3 cc和7 gg回到df中。我想插入包含缺少的原始值的行,并添加一行名为" Not Applicable"在它下面。最终结果应如下所示:
n s b
1 1 aa Feb
2 1 aa Jan
3 2 bb Mar
4 3 cc
5 3 Not Applicable
6 4 dd Dec
7 4 dd Mar
8 5 ee Apr
9 5 ee Jan
10 6 ff
11 6 Not Applicable
12 7 gg Aug
有没有人知道如何在不手动将值添加到原始数据框的情况下执行此操作?我希望它能够自动识别丢失的并自己添加它们,因为我的真实数据比这个更大。谢谢!
答案 0 :(得分:1)
一种解决方案可能是使用dplyr
:
library(dplyr)
anti_join(csv, select(df, -b), by=c("n", "s")) %>%
bind_rows(., mutate(., s = NA)) %>%
bind_rows(df) %>%
arrange(n)
# n s b
# 1 1 aa Feb
# 2 1 aa Jan
# 3 2 bb Mar
# 4 3 cc <NA>
# 5 3 <NA> <NA>
# 6 4 dd Dec
# 7 4 dd Mar
# 8 5 ee Apr
# 9 5 ee Jan
# 10 6 ff <NA>
# 11 6 <NA> <NA>
# 12 7 gg Aug
#
n = c(1, 1, 2, 4, 4, 5, 5, 7)
s = c("aa", "aa", "bb", "dd", "dd", "ee", "ee", "gg")
b = c("Feb", "Jan", "Mar", "Dec", "Mar", "Apr", "Jan", "Aug")
df = data.frame(n, s, stringsAsFactors = FALSE)
csv <- read.table(text = "n s
1 1 aa
2 2 bb
3 3 cc
4 4 dd
5 5 ee
6 6 ff
7 7 gg", header = TRUE, stringsAsFactor = FALSE)