根据列值展开数据框

时间:2019-06-23 20:56:34

标签: r

我有一个像这样的数据框:

    Title Female Male Asian HispanicLatino White
1   Title1      2    3     1              1     3
2   Title2      1    5    NA              1     5
3   Title3     NA    2    NA             NA     2

我想对此进行扩展,以便在标题1中有两行是“女”,三行是“男”,同时扩展其他列(我还有更多列)。

我尝试了各种不同的方法,以下技术上可行,但并不理想。

    df[is.na(df)] <- 0

    dfM <- uncount(df, df$Male)
    dfM$Sex <- "M"


    dfF <- uncount(df, df$Female)
    dfF$Sex <- "F"

    df <- rbind.fill(dfF, dfM)

哪个生产

   Title Female Male Asian  HispanicLatino   White Sex
1    Title1      2    3     1              1     3   F
2    Title1      2    3     1              1     3   F
3    Title2      1    5     0              1     5   F
4    Title1      2    3     1              1     3   M
5    Title1      2    3     1              1     3   M
6    Title1      2    3     1              1     3   M
7    Title2      1    5     0              1     5   M
8    Title2      1    5     0              1     5   M
9    Title2      1    5     0              1     5   M
10   Title2      1    5     0              1     5   M
11   Title2      1    5     0              1     5   M
12   Title3      0    2     0              0     2   M
13   Title3      0    2     0              0     2   M

想知道是否有更简单的方法来做到这一点。

以下是一些数据:

dput(df)
structure(list(Title = structure(1:3, .Label = c("Title1", 
"Title2", "Title3"), class = "factor"), Female = c(2L, 1L, NA
), Male = c(3L, 5L, 2L), Asian = c(1L, NA, NA), HispanicLatino = c(1L, 
1L, NA), White = c(3L, 5L, 2L)), .Names = c("Title", "Female", 
"Male", "Asian", "HispanicLatino", "White"), class = "data.frame", row.names = c(NA, 
-3L))

2 个答案:

答案 0 :(得分:2)

在创建map列(“性别”)的replace将NA设为0后,.id可以循环遍历这些列以扩展

library(tidyverse)
map_df(setNames(c("Female", "Male"), c("F", "M")), ~ 
       df %>%
           mutate_at(vars(.x), replace_na, 0) %>% 
           uncount(!! rlang::sym(.x), .remove = FALSE), .id = 'Sex') %>%
       mutate_at(3:6, replace_na, 0)
#   Sex  Title Female Male Asian HispanicLatino White
#1    F Title1      2    3     1              1     3
#2    F Title1      2    3     1              1     3
#3    F Title2      1    5     0              1     5
#4    M Title1      2    3     1              1     3
#5    M Title1      2    3     1              1     3
#6    M Title1      2    3     1              1     3
#7    M Title2      1    5     0              1     5
#8    M Title2      1    5     0              1     5
#9    M Title2      1    5     0              1     5
#10   M Title2      1    5     0              1     5
#11   M Title2      1    5     0              1     5
#12   M Title3      0    2     0              0     2
#13   M Title3      0    2     0              0     2

答案 1 :(得分:0)

使用基数R,我们可以在将rep转换为0后使用NA重复行数。

df[is.na(df)] <- 0 #Don't use this line if NA needed in final output.
df[rep(seq_len(nrow(df)), rowSums(df[c("Female", "Male")])), ]

#     Title Female Male Asian HispanicLatino White
#1   Title1      2    3     1              1     3
#1.1 Title1      2    3     1              1     3
#1.2 Title1      2    3     1              1     3
#1.3 Title1      2    3     1              1     3
#1.4 Title1      2    3     1              1     3
#2   Title2      1    5     0              1     5
#2.1 Title2      1    5     0              1     5
#2.2 Title2      1    5     0              1     5
#2.3 Title2      1    5     0              1     5
#2.4 Title2      1    5     0              1     5
#2.5 Title2      1    5     0              1     5
#3   Title3      0    2     0              0     2
#3.1 Title3      0    2     0              0     2

由于在最终输出中,所有NA都转换为0,因此这里将NA替换为0。如果我们希望在最终输出中将NA保留为NA,则可以在na.rm = TRUE中使用rowSums

如果行的顺序很重要,我们可以单独使用重复。我们还可以删除行名。

row_inds <- seq_len(nrow(df))
df1 <- df[c(rep(row_inds, df$Female), rep(row_inds, df$Male)), ]
rownames(df1) <- NULL
df1

#   Title Female Male Asian HispanicLatino White
#1  Title1      2    3     1              1     3
#2  Title1      2    3     1              1     3
#3  Title2      1    5     0              1     5
#4  Title1      2    3     1              1     3
#5  Title1      2    3     1              1     3
#6  Title1      2    3     1              1     3
#7  Title2      1    5     0              1     5
#8  Title2      1    5     0              1     5
#9  Title2      1    5     0              1     5
#10 Title2      1    5     0              1     5
#11 Title2      1    5     0              1     5
#12 Title3      0    2     0              0     2
#13 Title3      0    2     0              0     2