我经常处理具有需要分隔的字符串值的列的数据框。这是由数据输入程序中的“选择多个”选项产生的(不幸的是我无法更改)。我已经尝试了tidyr::separate
但是没有正确排序结果。一个例子:
require(tidyr)
df = data.frame(
x = 1:3,
sick = c(NA, "malaria", "diarrhoea malaria"))
df <- df %>%
separate(sick, c("diarrhoea", "cough", "malaria"),
sep = " ", fill = "right", remove = FALSE)
但我希望结果看起来像这样:
df2 = data.frame(
x = 1:3,
sick = c(NA, "malaria", "diarrhoea malaria"),
diarrhoea = c(NA, NA, "diarrhoea"),
cough = c(NA, NA, NA),
malaria = c(NA, "malaria", "malaria"))
非常感谢任何正确方向的帮助。
答案 0 :(得分:1)
我们可以尝试使用separate_rows
和dcast
library(tidyr)
library(reshape2)
library(dplyr)
separate_rows(df, sick) %>%
mutate(sick = factor(sick, levels = c("diarrhoea", "cough", "malaria")), sick1 = sick) %>%
dcast(., x~sick, value.var = "sick1", drop=FALSE) %>%
bind_cols(., df[2]) %>%
select(x, sick, diarrhoea, cough, malaria)
# x sick diarrhoea cough malaria
#1 1 <NA> <NA> <NA> <NA>
#2 2 malaria <NA> <NA> malaria
#3 3 diarrhoea malaria diarrhoea <NA> malaria
或另一个选项是使用cSplit
中的splitstackshape
与来自dcast
的{{1}}
data.table