我开始使用tidyr和dplyr。我有以下数据框:
email Assignment Stage Grade
1 foo1@bar.com course final 86.28
2 foo2@bar.com course first 68.87
3 foo3@bar.com course resub 38.06
4 foo3@bar.com course final 77.41
...
我想重新构建这个,以便根据Stage的值(首先,resub或final),我从一个Grade列中创建三个对应于Stage
值的列。 email Assignment first resub final
1 foo1@bar.com course 100.0 100.0 100.0
2 foo2@bar.com course 100.0 100.0 100.0
3 foo3@bar.com course 100.0 100.0 100.0
4 foo3@bar.com course 100.0 100.0 100.0
(由于剪切/粘贴,数据显然不匹配。)
我很困惑,我需要一个单独的()函数,但是如何?
答案 0 :(得分:1)
来自tidyr的spread()函数可以为您提供所需的结果。
email <- c("foo1@bar.com","foo2@bar.com","foo3@bar.com","foo3@bar.com")
Assignment <- rep("course",4)
Stage <- c("final","first","resub","final")
Grade <- c(86.28,68.87,38.06,77.41)
df <- data.frame(email,Assignment,Stage,Grade,stringsAsFactors = FALSE)
df <- df %>%
spread(Stage, Grade)