我有一个像这样的数据框:
Title Female Male Asian HispanicLatino White
1 Title1 2 3 1 1 3
2 Title2 1 5 NA 1 5
3 Title3 NA 2 NA NA 2
我想对此进行扩展,以便在标题1中有两行是“女”,三行是“男”,同时扩展其他列(我还有更多列)。
我尝试了各种不同的方法,以下技术上可行,但并不理想。
df[is.na(df)] <- 0
dfM <- uncount(df, df$Male)
dfM$Sex <- "M"
dfF <- uncount(df, df$Female)
dfF$Sex <- "F"
df <- rbind.fill(dfF, dfM)
哪个生产
Title Female Male Asian HispanicLatino White Sex
1 Title1 2 3 1 1 3 F
2 Title1 2 3 1 1 3 F
3 Title2 1 5 0 1 5 F
4 Title1 2 3 1 1 3 M
5 Title1 2 3 1 1 3 M
6 Title1 2 3 1 1 3 M
7 Title2 1 5 0 1 5 M
8 Title2 1 5 0 1 5 M
9 Title2 1 5 0 1 5 M
10 Title2 1 5 0 1 5 M
11 Title2 1 5 0 1 5 M
12 Title3 0 2 0 0 2 M
13 Title3 0 2 0 0 2 M
想知道是否有更简单的方法来做到这一点。
以下是一些数据:
dput(df)
structure(list(Title = structure(1:3, .Label = c("Title1",
"Title2", "Title3"), class = "factor"), Female = c(2L, 1L, NA
), Male = c(3L, 5L, 2L), Asian = c(1L, NA, NA), HispanicLatino = c(1L,
1L, NA), White = c(3L, 5L, 2L)), .Names = c("Title", "Female",
"Male", "Asian", "HispanicLatino", "White"), class = "data.frame", row.names = c(NA,
-3L))
答案 0 :(得分:2)
在创建map
列(“性别”)的replace
将NA设为0后,.id
可以循环遍历这些列以扩展
library(tidyverse)
map_df(setNames(c("Female", "Male"), c("F", "M")), ~
df %>%
mutate_at(vars(.x), replace_na, 0) %>%
uncount(!! rlang::sym(.x), .remove = FALSE), .id = 'Sex') %>%
mutate_at(3:6, replace_na, 0)
# Sex Title Female Male Asian HispanicLatino White
#1 F Title1 2 3 1 1 3
#2 F Title1 2 3 1 1 3
#3 F Title2 1 5 0 1 5
#4 M Title1 2 3 1 1 3
#5 M Title1 2 3 1 1 3
#6 M Title1 2 3 1 1 3
#7 M Title2 1 5 0 1 5
#8 M Title2 1 5 0 1 5
#9 M Title2 1 5 0 1 5
#10 M Title2 1 5 0 1 5
#11 M Title2 1 5 0 1 5
#12 M Title3 0 2 0 0 2
#13 M Title3 0 2 0 0 2
答案 1 :(得分:0)
使用基数R,我们可以在将rep
转换为0后使用NA
重复行数。
df[is.na(df)] <- 0 #Don't use this line if NA needed in final output.
df[rep(seq_len(nrow(df)), rowSums(df[c("Female", "Male")])), ]
# Title Female Male Asian HispanicLatino White
#1 Title1 2 3 1 1 3
#1.1 Title1 2 3 1 1 3
#1.2 Title1 2 3 1 1 3
#1.3 Title1 2 3 1 1 3
#1.4 Title1 2 3 1 1 3
#2 Title2 1 5 0 1 5
#2.1 Title2 1 5 0 1 5
#2.2 Title2 1 5 0 1 5
#2.3 Title2 1 5 0 1 5
#2.4 Title2 1 5 0 1 5
#2.5 Title2 1 5 0 1 5
#3 Title3 0 2 0 0 2
#3.1 Title3 0 2 0 0 2
由于在最终输出中,所有NA
都转换为0,因此这里将NA
替换为0。如果我们希望在最终输出中将NA
保留为NA
,则可以在na.rm = TRUE
中使用rowSums
。
如果行的顺序很重要,我们可以单独使用重复。我们还可以删除行名。
row_inds <- seq_len(nrow(df))
df1 <- df[c(rep(row_inds, df$Female), rep(row_inds, df$Male)), ]
rownames(df1) <- NULL
df1
# Title Female Male Asian HispanicLatino White
#1 Title1 2 3 1 1 3
#2 Title1 2 3 1 1 3
#3 Title2 1 5 0 1 5
#4 Title1 2 3 1 1 3
#5 Title1 2 3 1 1 3
#6 Title1 2 3 1 1 3
#7 Title2 1 5 0 1 5
#8 Title2 1 5 0 1 5
#9 Title2 1 5 0 1 5
#10 Title2 1 5 0 1 5
#11 Title2 1 5 0 1 5
#12 Title3 0 2 0 0 2
#13 Title3 0 2 0 0 2