所以我正在研究R中的一个问题,我的数据框有一个包含一系列变量名的列:
*Name* *id_key* *detail* *var_names* *values*
Jose 123 red foo abc
Jose 123 blue foo abc
Jose 123 green foo abc
Mel 456 red bar 555
Mel 456 green bar 555
Dom 789 yellow choo fjfj55bar
我想要实现的目标如下:
*Name* *id_key* *detail* *foo* *bar* *choo*
Jose 123 red abc NA NA
Jose 123 blue abc NA NA
Jose 123 green abc NA NA
Mel 456 red NA 555 NA
Mel 456 green NA 555 NA
Dom 789 yellow NA NA fjfj55bar
我尝试使用reshape2包中的dcast并使用以下命令 - 但它没有产生预期的结果:
toy_data_unmelt <- dcast(toy_data, formula = name~var_names, value.var = "values")
非常感谢任何帮助!
答案 0 :(得分:1)
reshape2
已被tidyr
取代。 (reshape2
仍然可用,但我会进行切换以保持代码最新。)以下是tidyr
解决方案:
library(tidyr)
toy_data <- read_table("*Name* *id_key* *detail* *var_names* *values*
Jose 123 red foo abc
Jose 123 blue foo abc
Jose 123 green foo abc
Mel 456 red bar 555
Mel 456 green bar 555
Dom 789 yellow choo fjfj55bar")
toy_data_wide <- spread(toy_data, `*var_names*`, `*values*`)
或者,使用管道运算符
toy_data_wide <- toy_data %>%
spread(`*var_names*`, `*values*`)
答案 1 :(得分:1)
您需要使用spread
包中的tidyr
功能:
library(tidyr)
toy_data = data.frame(Name = c("Jose", "Jose", "Jose", "Mel", "Mel", "Dom"),
id_key = c(123, 123, 123, 456, 456, 789),
detail = c("red", "blue", "green", "red", "green", "yellow"),
var_names = c("foo", "foo", "foo", "bar", "bar", "choo"),
values = c("abc", "abc", "abc", "555", "555", "fjfj55bar"))
toy_data %>% spread(var_names, values, fill = NA)
输出:
# Name id_key detail bar choo foo
#1 Dom 789 yellow <NA> fjfj55bar <NA>
#2 Jose 123 blue <NA> <NA> abc
#3 Jose 123 green <NA> <NA> abc
#4 Jose 123 red <NA> <NA> abc
#5 Mel 456 green 555 <NA> <NA>
#6 Mel 456 red 555 <NA> <NA>