所以我有一个大型数据集,如下所示:
V1 V2 V3 V4
1 Sleep Domestic Eat Child Care
2 Sleep Domestic Eat Paid
3 Sleep Domestic Eat Child Care
4 Sleep Eat Paid <NA>
我想要做的是reorder
基于&#34;模板的列{&#34}
["Sleep", "Eat", "Domestic", "Paid", "Child care"]
获取(输出)
V1 V2 V3 V4 V5
Sleep Eat Domestic NA Child Care
Sleep Eat Domestic Paid NA
Sleep Eat Domestic NA Child Care
Sleep Eat NA Paid NA
因此,在第1列Sleep
中,第2列Eat
,...
我不知道从哪里开始。 任何的想法 ?
数据
x = structure(list(V1 = c("Sleep", "Sleep", "Sleep", "Sleep"), V2 = c("Domestic",
"Domestic", "Domestic", "Eat"), V3 = c("Eat", "Eat", "Eat", "Paid"
), V4 = c("Child Care", "Paid", "Child Care", NA)), .Names = c("V1",
"V2", "V3", "V4"), row.names = c(NA, 4L), class = "data.frame")
template = c('Sleep', 'Eat', 'Domestic', 'Paid', 'Child care')
答案 0 :(得分:3)
检查每个rowSums
值的template
,然后再将它们拼凑在一起:
template <- c("Sleep", "Eat", "Domestic", "Paid", "Child Care")
# i've fixed this template so the case matches the values for 'Child Care'
data.frame(lapply(
setNames(template, seq_along(template)),
function(v) c(NA,v)[(rowSums(x==v,na.rm=TRUE)>0)+1]
))
# X1 X2 X3 X4 X5
#1 Sleep Eat Domestic <NA> Child Care
#2 Sleep Eat Domestic Paid <NA>
#3 Sleep Eat Domestic <NA> Child Care
#4 Sleep Eat <NA> Paid <NA>
或使用pmax
的替代方案:
data.frame(
lapply(
setNames(template, seq_along(template)),
function(v) do.call(pmax, c(replace(x, x != v,NA),na.rm=TRUE))
)
)
答案 1 :(得分:2)
reshape2和dplyr解决方案。显然不像其他人那么紧凑。这个想法是融化(变高),顺序因素和演员。
library(reshape2)
library(dplyr)
# make and id column
x$id <- row.names(x)
# make a tall result id, var, value
tall <- x %>%
melt(id.vars="id") %>%
select(id, value)
# make an ordered factor with the template
tall$value <- factor(tall$value, levels=template, ordered = TRUE)
# make wide result with dcast
result <- tall %>%
filter(!is.na(value)) %>% # drop the NAs
mutate(var = value) %>% # name the column the same as the value
dcast(id ~ var) # make into wide format
result
# id Sleep Eat Domestic Paid Child Care
#1 1 Sleep Eat Domestic <NA> Child Care
#2 2 Sleep Eat Domestic Paid <NA>
#3 3 Sleep Eat Domestic <NA> Child Care
#4 4 Sleep Eat <NA> Paid <NA>
答案 2 :(得分:2)
以下是tidyverse
library(dplyr)
library(tidyr)
library(tibble)
rownames_to_column(x, 'id') %>%
gather(Var, Val, -id, na.rm = TRUE) %>%
mutate(Var = factor(Val, levels = template)) %>%
spread(Var, Val) %>%
select(-id) %>%
setNames(., paste0("V", seq_along(template)))
# V1 V2 V3 V4 V5
#1 Sleep Eat Domestic <NA> Child Care
#2 Sleep Eat Domestic Paid <NA>
#3 Sleep Eat Domestic <NA> Child Care
#4 Sleep Eat <NA> Paid <NA>