我从调查中获得了大量数据集,其中包含大量虚拟变量的语句。每个虚拟元素都是具有“引用”和“未引用”等级的因子。由于不同的语句组属于同一主题,我想将它们转换为1个更大的因子变量,将虚拟变量作为级别,并且值保持“引用”和“未引用”(或1和0,不此刻的事情)。
所以我现在从2个虚拟变量看起来像这样:
pp_plan_thoughtAWhile pp_plan_justHappen
not quoted not quoted
not quoted not quoted
not quoted not quoted
not quoted not quoted
not quoted quoted
quoted quoted
我需要它看起来像这样:
#plan
## value thoughtAWhile justHappen
0 350 550
1 650 450
有谁知道怎么做?任何帮助都将受到高度赞赏,我正在努力!
答案 0 :(得分:2)
我们可以使用gather
将数据集重新整形为“long”格式,然后将count
和spread
的频率设为“宽”格式
library(tidyverse)
gather(df1) %>%
count(key, value) %>%
spread(key, n)
答案 1 :(得分:0)
这是一种方法。
数据强>
pp_plan_thoughtAWhile <- sample(c("Quoted", "NotQuoted"), 10, replace = T, prob=c(0.7, 0.3))
pp_plan_justHappen <- sample(c("Quoted", "NotQuoted"), 10, replace = T, prob=c(0.5, 0.5))
dv <- data.frame(pp_plan_justHappen, pp_plan_thoughtAWhile)
部分处理
dv$pp_plan_justHappen <- as.factor (dv$pp_plan_justHappen)
dv$pp_plan_thoughtAWhile <- as.factor(dv$pp_plan_thoughtAWhile)
library(reshape2)
mdata <- melt(dv)
mdata$bin_plan_justhappen <- ifelse(mdata$pp_plan_justHappen=="Quoted", 1, 0)
mdata$bin_plan_thoughtwhile <- ifelse(mdata$pp_plan_thoughtAWhile=="Quoted", 1, 0)
library(plyr)
table(mdata$bin_plan_justhappen, mdata$bin_plan_thoughtwhile)
plyr::count(mdata, c("bin_plan_justhappen", "bin_plan_thoughtwhile"))
<强>结果强>
bin_plan_justhappen bin_plan_thoughtwhile freq
0 1 2
1 0 1
1 1 7