Question

我从调查中获得了大量数据集，其中包含大量虚拟变量的语句。每个虚拟元素都是具有“引用”和“未引用”等级的因子。由于不同的语句组属于同一主题，我想将它们转换为1个更大的因子变量，将虚拟变量作为级别，并且值保持“引用”和“未引用”（或1和0，不此刻的事情）。

所以我现在从2个虚拟变量看起来像这样：

    pp_plan_thoughtAWhile   pp_plan_justHappen  
     not quoted                  not quoted 
     not quoted                  not quoted 
     not quoted                  not quoted 
     not quoted                  not quoted 
     not quoted                  quoted 
     quoted                      quoted

我需要它看起来像这样：

               #plan 
      ## value     thoughtAWhile    justHappen
           0           350             550  
           1           650             450

有谁知道怎么做？任何帮助都将受到高度赞赏，我正在努力！

Answer 1

我们可以使用gather将数据集重新整形为“long”格式，然后将count和spread的频率设为“宽”格式

library(tidyverse)
gather(df1) %>%
   count(key, value) %>%
   spread(key, n)

Answer 2

这是一种方法。

数据

pp_plan_thoughtAWhile <- sample(c("Quoted", "NotQuoted"), 10, replace = T, prob=c(0.7, 0.3)) pp_plan_justHappen <- sample(c("Quoted", "NotQuoted"), 10, replace = T, prob=c(0.5, 0.5)) dv <- data.frame(pp_plan_justHappen, pp_plan_thoughtAWhile)

部分处理

dv$pp_plan_justHappen <- as.factor (dv$pp_plan_justHappen) dv$pp_plan_thoughtAWhile <- as.factor(dv$pp_plan_thoughtAWhile) library(reshape2) mdata <- melt(dv) mdata$bin_plan_justhappen <- ifelse(mdata$pp_plan_justHappen=="Quoted", 1, 0) mdata$bin_plan_thoughtwhile <- ifelse(mdata$pp_plan_thoughtAWhile=="Quoted", 1, 0) library(plyr) table(mdata$bin_plan_justhappen, mdata$bin_plan_thoughtwhile) plyr::count(mdata, c("bin_plan_justhappen", "bin_plan_thoughtwhile"))

<强>结果

bin_plan_justhappen bin_plan_thoughtwhile freq 0 1 2 1 0 1 1 1 7

如何将多个虚拟变量转换为1个因子变量？

2 个答案: