我有一个名为questions
q1 q2 q3
A A B
C A A
A B C
我想重塑成
question answer freq
1 A 2
1 B 0
1 C 1
2 A 2
2 B 1
2 C 0
3 A 1
3 B 1
3 C 1
我觉得应该有一种方法可以使用reshape2或plyr,但我无法弄明白。
相反,我做了以下事情:
tbl <- data.frame()
for(i in 1:dim(questions)[2]){
subtable <- cbind(question = rep(i, 3),
as.data.frame(table(questions[i])))
tbl <- rbind(tbl, subtable)
}
是否有更简洁的方法来重塑此表?
答案 0 :(得分:5)
这是一个基本R方法,其概念与@akrun发布的方法类似。我对清理工作感到困扰,因为这主要是整容,并且与问题的概念无关。
一般方法是:
data.frame(table(stack(mydf))
但是,stack
无法使用factor
,因此,如果您的数据为factor
而非character
,则必须使用as.character
首先,像这样:
data.frame(table(stack(lapply(mydf, as.character))))
# values ind Freq
# 1 A q1 2
# 2 B q1 0
# 3 C q1 1
# 4 A q2 2
# 5 B q2 1
# 6 C q2 0
# 7 A q3 1
# 8 B q3 1
# 9 C q3 1
远离&#34; plyr&#34;和&#34; reshape2&#34;而是转向&#34; dplyr&#34;和&#34; tidyr&#34;,您可以尝试:
library(dplyr)
library(tidyr)
mydf %>%
gather(question, answer, everything()) %>% ## Get the data into a long form
group_by(question, answer) %>% ## Group by both question and answer columns
summarise(freq = n()) %>% ## Calculate the relevant frequency
right_join(expand(., question, answer)) ## Merge with all combinations of Qs and As
# Joining by: c("question", "answer")
# Source: local data frame [9 x 3]
# Groups: question
#
# question answer freq
# 1 q1 A 2
# 2 q1 B NA
# 3 q1 C 1
# 4 q2 A 2
# 5 q2 B 1
# 6 q2 C NA
# 7 q3 A 1
# 8 q3 B 1
# 9 q3 C 1
答案 1 :(得分:3)
尝试
library(qdapTools)
library(reshape2)
colnames(questions) <- sub('\\D+', '', colnames(questions))
setNames(melt(as.matrix(mtabulate(questions))),
c('question', 'answer', 'freq'))
或使用data.table
library(data.table)#v.1.9.5+
setkey(
setnames(
melt(setDT(questions, keep.rownames=TRUE), id.var='rn',
value.name='answer')[, list(freq=.N),
by=list(variable, answer)],
'variable', 'question'),
question, answer)[
CJ(question=unique(question), answer=unique(answer))][
is.na(freq), freq:=0][]
# question answer freq
#1: 1 A 2
#2: 1 B 0
#3: 1 C 1
#4: 2 A 2
#5: 2 B 1
#6: 2 C 0
#7: 3 A 1
#8: 3 B 1
#9: 3 C 1
答案 2 :(得分:3)
是的,由于零点,它有点棘手。熔化后,不要直接浇铸成您需要的形状,浇铸成宽的形状然后再熔化。使用基础R和table
可能同样容易。
d <- read.table(text="q1 q2 q3
A A B
C A A
A B C", header=TRUE, as.is=TRUE)
melt(dcast(melt(d, measure.vars=1:3), value ~ variable))
## Aggregation function missing: defaulting to length
## Using value as id variables
## value variable value
## 1 A q1 2
## 2 B q1 0
## 3 C q1 1
## 4 A q2 2
## 5 B q2 1
## 6 C q2 0
## 7 A q3 1
## 8 B q3 1
## 9 C q3 1