我们使用仅提供HTML回复的第三方调查应用,然后我将其写入如下所示的CSV文件:
2,"Rank features by frequency of use ","AMER","JAPAC","EMEA","Total"
2.1,"Stored procedures ",,,,
"Never",,1,2,1,4
"Sometimes (<50% of applications)",,10,6,5,21
"Often (>50%)",,7,4,2,13
"Always",,1,0,0,1
2.2,"Triggers ",,,,
"Never"," ",4,3,2,9
"Sometimes (<50% of applications)"," ",13,9,3,25
"Often (>50%)"," ",2,0,2,4
"Always"," ",0,0,1,1
这种情况继续发生,有时会有三个以上的回复。我喜欢它在&#34; tidy&#34;格式:
Q.Num, Response, Never, Sometimes, Often, Always, Other.Response,
2.1, "Stored Procedures", 4, 21, 13, 1
2.2, "Triggers", 9, 25, 4, 1
(我可以管理将问题映射到其他地方的数字,暂时放弃区域数据)
我想我可以通过一个循环(更容易在python中)来做到这一点,但我希望有一个更像&#34; R-like&#34;方式...
答案 0 :(得分:0)
因为你要求整洁data.frame
我想我会用tidyr
包给你一个。
library(tidyr)
library(dplyr)
d <- read.csv("path/to/data.csv")
col_n = head(d$X2, 5)[-1] #Obtaining "Always, Never, Often, Sometimes"
d_1 <- d[!d$X2 %in% col_n, c("X2", "Rank.features.by.frequency.of.use.")]
d_2 <- d[d$X2 %in% col_n, !names(d) %in% "Rank.features.by.frequency.of.use."]
d_2$Q.Num <- rep(d_1$X2, each = length(col_n))
d_2$Response <- rep(d_1$Rank.features.by.frequency.of.use., each = length(col_n))
d_2 %>%
gather(key, value, -Q.Num, -Response, -X2) %>%
spread(X2, value) %>%
rename(Country = key)
此数据集包括所有国家/地区,而不仅仅是示例中列出的总计。但是,从您的帖子看来,您似乎需要国家级数据。