我知道spread
包中的tidyr
函数,但这是我无法实现的。
我有一个data.frame有2列,如下所示。我需要将列Subject转换为带有1和0的二进制列。
以下是data.frame
studentInfo <- data.frame(StudentID = c(1,1,1,2,3,3),
Subject = c("Maths", "Science", "English", "Maths", "History", "History"))
> studentInfo
StudentID Subject
1 1 Maths
2 1 Science
3 1 English
4 2 Maths
5 3 History
6 3 History
我期待的输出是:
StudentID Maths Science English History
1 1 1 1 1 0
2 2 1 0 0 0
3 3 0 0 0 1
请使用“传播”功能或任何其他功能协助如何执行此操作。 感谢
答案 0 :(得分:9)
使用reshape2
我们可以dcast
从长到宽。
由于您只想要二元结果,我们可以先unique
数据
library(reshape2)
si <- unique(studentInfo)
dcast(si, formula = StudentID ~ Subject, fun.aggregate = length)
# StudentID English History Maths Science
#1 1 1 0 1 1
#2 2 0 0 1 0
#3 3 0 1 0 0
使用tidyr
和dplyr
的另一种方法是
library(tidyr)
library(dplyr)
studentInfo %>%
mutate(yesno = 1) %>%
distinct %>%
spread(Subject, yesno, fill = 0)
# StudentID English History Maths Science
#1 1 1 0 1 1
#2 2 0 0 1 0
#3 3 0 1 0 0
虽然我还不是tidyr
语法的粉丝......
答案 1 :(得分:7)
我们可以使用table
base R
+(table(studentInfo)!=0)
# Subject
#StudentID English History Maths Science
# 1 1 0 1 1
# 2 0 0 1 0
# 3 0 1 0 0
答案 2 :(得分:2)
使用 tidyr :
library(tidyr)
studentInfo <- data.frame(StudentID = c(1,1,1,2,3,3),
Subject = c("Maths", "Science", "English", "Maths", "History", "History"))
pivot_wider(studentInfo,
names_from = "Subject",
values_from = 'Subject',
values_fill = list(Subject=0),
values_fn = list(Subject = ~+(as.logical(length(.)))))
#> # A tibble: 3 x 5
#> StudentID Maths Science English History
#> <dbl> <int> <int> <int> <int>
#> 1 1 1 1 1 0
#> 2 2 1 0 0 0
#> 3 3 0 0 0 1
由reprex package(v0.3.0)于2019-09-19创建