根据列中的变量名创建组并获取计数

时间:2016-02-23 21:39:00

标签: r loops count dplyr pipe

我已经看到了几种获取数据和按组创建计数的方法,但我想做的事情有点复杂...... 我有一个类似于下面的数据集:

d <- data.frame(ID=c("1ef","3ic","9sd"),
            CI_Region=c("Bay Area","North Sierra","Central Valley"),
            Q18_429=c("Not a threat","Slightly serious","Very Serious"),
            Q18_430=c("Extremely serious","Somewhat serious","Slightly serious"),
            Q18_431=c("Slightly serious","Unknown","No Answer"))

我希望按CI_Region进行分组,然后按问题计算每个回复的计数(例如&#34;不是威胁&#34;,&#34;稍微严重&#34;等等)。

最终结果是一个表格,按行和CI区域显示响应类别的计数。所以我能够看到湾区 - 问题18_429-不是威胁= 1.

提前致谢!

1 个答案:

答案 0 :(得分:0)

  d <- data.frame(ID=c("1ef","3ic","9sd"),
            CI_Region=c("Bay Area","North Sierra","Central Valley"),
            Q18_429=c("Not a threat","Slightly serious","Very Serious"),
            Q18_430=c("Extremely serious","Somewhat serious","Slightly serious"),
            Q18_431=c("Slightly serious","Unknown","No Answer"))

将数据重塑为更整洁的格式使其更容易。

library(tidyr)

gather(d, question, response, -ID, -CI_Region) %>% 
  group_by(CI_Region, question, response) %>% 
  tally()



     CI_Region question          response     n
        (fctr)   (fctr)             (chr) (int)
      Bay Area  Q18_429      Not a threat     1
      Bay Area  Q18_430 Extremely serious     1
      Bay Area  Q18_431  Slightly serious     1
Central Valley  Q18_429      Very Serious     1
Central Valley  Q18_430  Slightly serious     1
Central Valley  Q18_431         No Answer     1
  North Sierra  Q18_429  Slightly serious     1
  North Sierra  Q18_430  Somewhat serious     1
  North Sierra  Q18_431           Unknown     1

这就是你想要的吗?