dcast的价值在于文字

时间:2019-04-26 18:07:30

标签: r tidyr

我正在寻找传播或抛弃data.frame的方法,其中的值是文本字符串。

df = data.frame(employeeid = c(1,1,2,2),
                question=c('do you like milk?', 'do you like apples?', 'do you like milk?', 'do you like apples?'),
                Answer=c('Yes','No','No','No'))

我希望将其转换为一种宽格式,其中列标题是员工ID和问题。我已经尝试过df = spread(df,question,Answer),但这似乎没有做到

1 个答案:

答案 0 :(得分:1)

由于您的标题中有dcast,因此我假设data.table

data.table::dcast(question ~ employeeid, data = df, value.var = "Answer")
#              question   1  2
# 1 do you like apples?  No No
# 2   do you like milk? Yes No

但另一种选择:

tidyr::spread(df, employeeid, Answer)
#              question   1  2
# 1 do you like apples?  No No
# 2   do you like milk? Yes No

编辑:由于数据中似乎存在重复项,您可以通过以下方式找到“出现次数最多”的答案:

most <- function(x) names(sort(table(x)))[1]
data.table::dcast(question~employeeid, data=df, value.var="Answer", fun.aggregate = most)
#              question   1   2
# 1 do you like apples? Yes Yes
# 2   do you like milk?  No Yes