我有一个数据框如下所示
Hair Eye Freq
1 Black Brown 32
2 Brown Brown 53
3 Red Brown 10
4 Blond Brown 3
5 Red Blue 10
6 Blond Blue 30
7 Black Hazel 10
8 Blond Hazel 5
在上述数据帧中,在不同的眼睛颜色Black, Brown, Red and Blond
中记录了4种头发颜色Brown, Blue and Hazel
的频率。但是,我想填写相应眼睛颜色的缺失头发颜色频率,以便产生如下数据帧。任何帮助表示赞赏。
Hair Eye Freq
1 Black Brown 32
2 Brown Brown 53
3 Red Brown 10
4 Blond Brown 3
5 Black Blue 0
6 Brown Blue 0
7 Red Blue 10
8 Blond Blue 30
9 Black Hazel 10
10 Brown Hazel 0
11 Red Hazel 0
12 Blond Hazel 5
答案 0 :(得分:2)
使用expand.grid
创建一个包含头发和眼睛颜色组合的新表格。然后使用join方法将df1
的频率绑定到df2
。最后删除NAs。
library('data.table')
hair <- c('Black', 'Brown', 'Red', 'Blond') # hair colors
eye <- c('Brown', 'Blue', 'Hazel') # eye colors
df2 <- expand.grid(Hair = hair, Eye = eye) # data frame with combinations of eye and hair colors
setDT(df2)[df1, `:=` (Freq = i.Freq), on = .(Hair, Eye)] # join df2 with df1 based `on = .(Hair, Eye)` and bind `Freq` from df1 to df2
df2[is.na(Freq), Freq := 0 ] # remove NA with 0
<强>输出:强>
df2
# Hair Eye Freq
# 1: Black Brown 32
# 2: Brown Brown 53
# 3: Red Brown 10
# 4: Blond Brown 3
# 5: Black Blue 0
# 6: Brown Blue 0
# 7: Red Blue 10
# 8: Blond Blue 30
# 9: Black Hazel 10
# 10: Brown Hazel 0
# 11: Red Hazel 0
# 12: Blond Hazel 5
数据:强>
df1 <- fread('id Hair Eye Freq
1 Black Brown 32
2 Brown Brown 53
3 Red Brown 10
4 Blond Brown 3
5 Red Blue 10
6 Blond Blue 30
7 Black Hazel 10
8 Blond Hazel 5')
df1[, id:=NULL]
答案 1 :(得分:2)
一个基本R选项是为expand.grid
和Hair
以及Eye
与原始组合的每个组合创建另一个merge
数据框。
merge(expand.grid(Hair=unique(df$Hair),Eye=unique(df$Eye)), df[-1], all.x = TRUE)
# Hair Eye Freq
#1 Black Blue NA
#2 Black Brown 32
#3 Black Hazel 10
#4 Blond Blue 30
#5 Blond Brown 3
#6 Blond Hazel 5
#7 Brown Blue NA
#8 Brown Brown 53
#9 Brown Hazel NA
#10 Red Blue 10
#11 Red Brown 10
#12 Red Hazel NA
上面的结果给出NA
,我们可以轻松地将这些NA
&#39转换为0
df1 <- merge(expand.grid(Hair = unique(df$Hair), Eye = unique(df$Eye)), df[-1],
all.x = TRUE)
df1[is.na(df1)] <- 0
答案 2 :(得分:1)
如果我们使用R
,则complete
tidyr
library(tidyr)
complete(df1, Hair, Eye, fill = list(Freq = 0)) %>%
arrange(factor(Eye, levels = unique(df1$Eye)), factor(Hair, levels = unique(df1$Hair)))
# A tibble: 12 × 3
# Hair Eye Freq
# <chr> <chr> <dbl>
#1 Black Brown 32
#2 Brown Brown 53
#3 Red Brown 10
#4 Blond Brown 3
#5 Black Blue 0
#6 Brown Blue 0
#7 Red Blue 10
#8 Blond Blue 30
#9 Black Hazel 10
#10 Brown Hazel 0
#11 Red Hazel 0
#12 Blond Hazel 5