填充数据框中缺少的行

时间:2017-03-22 06:36:18

标签: python r dataframe

我有一个数据框如下所示

   Hair   Eye    Freq
1  Black Brown      32
2  Brown Brown      53
3    Red Brown      10
4  Blond Brown       3
5    Red  Blue      10
6  Blond  Blue      30
7  Black Hazel      10
8  Blond Hazel       5

在上述数据帧中,在不同的眼睛颜色Black, Brown, Red and Blond中记录了4种头发颜色Brown, Blue and Hazel的频率。但是,我想填写相应眼睛颜色的缺失头发颜色频率,以便产生如下数据帧。任何帮助表示赞赏。

   Hair   Eye    Freq
1  Black Brown      32
2  Brown Brown      53
3    Red Brown      10
4  Blond Brown       3
5  Black  Blue      0
6  Brown  Blue      0
7    Red  Blue      10
8  Blond  Blue      30
9  Black Hazel      10
10 Brown Hazel      0
11   Red Hazel      0
12 Blond Hazel      5

3 个答案:

答案 0 :(得分:2)

使用expand.grid创建一个包含头发和眼睛颜色组合的新表格。然后使用join方法将df1的频率绑定到df2。最后删除NAs。

library('data.table')
hair <- c('Black', 'Brown', 'Red', 'Blond')  # hair colors
eye <- c('Brown', 'Blue', 'Hazel')           # eye colors
df2 <- expand.grid(Hair = hair, Eye = eye)   # data frame with combinations of eye and hair colors
setDT(df2)[df1, `:=` (Freq = i.Freq), on = .(Hair, Eye)]  # join df2 with df1 based `on = .(Hair, Eye)` and bind `Freq` from df1 to df2
df2[is.na(Freq), Freq := 0 ]                # remove NA with 0

<强>输出:

df2
#     Hair   Eye Freq
# 1: Black Brown   32
# 2: Brown Brown   53
# 3:   Red Brown   10
# 4: Blond Brown    3
# 5: Black  Blue    0
# 6: Brown  Blue    0
# 7:   Red  Blue   10
# 8: Blond  Blue   30
# 9: Black Hazel   10
# 10: Brown Hazel    0
# 11:   Red Hazel    0
# 12: Blond Hazel    5

数据:

df1 <- fread('id   Hair   Eye    Freq
1  Black Brown      32
2  Brown Brown      53
3    Red Brown      10
4  Blond Brown       3
5    Red  Blue      10
6  Blond  Blue      30
7  Black Hazel      10
8  Blond Hazel       5')

df1[, id:=NULL]

答案 1 :(得分:2)

一个基本R选项是为expand.gridHair以及Eye与原始组合的每个组合创建另一个merge数据框。

merge(expand.grid(Hair=unique(df$Hair),Eye=unique(df$Eye)), df[-1], all.x = TRUE)

#    Hair   Eye Freq
#1  Black  Blue   NA
#2  Black Brown   32
#3  Black Hazel   10
#4  Blond  Blue   30
#5  Blond Brown    3
#6  Blond Hazel    5
#7  Brown  Blue   NA
#8  Brown Brown   53
#9  Brown Hazel   NA
#10   Red  Blue   10
#11   Red Brown   10
#12   Red Hazel   NA

上面的结果给出NA,我们可以轻松地将这些NA&#39转换为0

df1 <- merge(expand.grid(Hair = unique(df$Hair), Eye = unique(df$Eye)), df[-1], 
                                                              all.x = TRUE)
df1[is.na(df1)] <- 0

答案 2 :(得分:1)

如果我们使用R,则complete

中的一个选项为tidyr
library(tidyr)
complete(df1, Hair, Eye, fill = list(Freq = 0)) %>%
      arrange(factor(Eye, levels = unique(df1$Eye)), factor(Hair, levels = unique(df1$Hair)))
# A tibble: 12 × 3
#    Hair   Eye  Freq
#   <chr> <chr> <dbl>
#1  Black Brown    32
#2  Brown Brown    53
#3    Red Brown    10
#4  Blond Brown     3
#5  Black  Blue     0
#6  Brown  Blue     0
#7    Red  Blue    10
#8  Blond  Blue    30
#9  Black Hazel    10
#10 Brown Hazel     0
#11   Red Hazel     0
#12 Blond Hazel     5