格式调查对整齐R数据框的响应

时间:2016-01-26 22:47:45

标签: r csv

我们使用仅提供HTML回复的第三方调查应用,然后我将其写入如下所示的CSV文件:

2,"Rank features by frequency of use ","AMER","JAPAC","EMEA","Total"
2.1,"Stored procedures ",,,,
"Never",,1,2,1,4
"Sometimes (<50% of applications)",,10,6,5,21
"Often (>50%)",,7,4,2,13
"Always",,1,0,0,1
2.2,"Triggers ",,,,
"Never","  ",4,3,2,9
"Sometimes (<50% of applications)","  ",13,9,3,25
"Often (>50%)","  ",2,0,2,4
"Always","  ",0,0,1,1

这种情况继续发生,有时会有三个以上的回复。我喜欢它在&#34; tidy&#34;格式:

Q.Num, Response, Never, Sometimes, Often, Always, Other.Response, 
2.1, "Stored Procedures", 4, 21, 13, 1
2.2, "Triggers", 9, 25, 4, 1

(我可以管理将问题映射到其他地方的数字,暂时放弃区域数据)

我想我可以通过一个循环(更容易在python中)来做到这一点,但我希望有一个更像&#34; R-like&#34;方式...

1 个答案:

答案 0 :(得分:0)

因为你要求整洁data.frame我想我会用tidyr包给你一个。

library(tidyr)
library(dplyr)

d <- read.csv("path/to/data.csv")

col_n = head(d$X2, 5)[-1] #Obtaining "Always, Never, Often, Sometimes"

d_1 <- d[!d$X2 %in% col_n, c("X2", "Rank.features.by.frequency.of.use.")]

d_2 <- d[d$X2 %in% col_n, !names(d) %in% "Rank.features.by.frequency.of.use."]

d_2$Q.Num <- rep(d_1$X2, each = length(col_n))

d_2$Response <- rep(d_1$Rank.features.by.frequency.of.use., each = length(col_n))

d_2 %>% 
  gather(key, value, -Q.Num, -Response, -X2) %>%
  spread(X2, value) %>%
  rename(Country = key)

此数据集包括所有国家/地区,而不仅仅是示例中列出的总计。但是,从您的帖子看来,您似乎需要国家级数据。