我有调查数据。它来自一个看起来像这样的问题:
Did you do any of the following activities during your PhD
Yes, paid by my school. Yes, paid by me. No.
Attended an internationl conference?
Bought textbooks?
数据会以这种方式自动保存在电子表格中:
id conf.1 conf.2 conf.3 text.1 text.2 text.3
1 1 1
2 1 1
3 1 1
4 1 1
5
这意味着参与者1参加了她大学支付的会议;参加者2参加了会议,参与者3没有参加。
我想在单个变量中合并conf.1,conf.2和conf.3以及text.1,text.2和text.3
id new.conf new.text
1 1 2
2 2 1
3 3 1
4 3 3
where the number now respresents the categories of the survey question
Thanks for your help
答案 0 :(得分:2)
您没有说明每组问题是否都有多个答案。如果是这样,这种方法可能对您不起作用。如果是这种情况,我建议您在继续前进之前再提问reproducible。有了这个警告,请给它一个旋转:
library(reshape2)
#recreate your data
dat <- data.frame(id = 1:5,
conf.1 = c(1,rep(NA,4)),
conf.2 = c(NA,1, rep(NA,3)),
conf.3 = c(NA,NA,1,1, NA),
text.1 = c(NA,1,1,NA,NA),
text.2 = c(1, rep(NA,4)),
text.3 = c(rep(NA,3),1, NA))
#melt into long format
dat.m <- melt(dat, id.vars = "id")
#Split on the "."
dat.m[, c("variable", "val")] <- with(dat.m, colsplit(variable, "\\.", c("variable", "val")))
#Subset out only the complete cases
dat.m <- dat.m[complete.cases(dat.m),]
#Cast back into wide format
dcast(id ~ variable, value.var = "val", data = dat.m)
#-----
id conf text
1 1 1 2
2 2 2 1
3 3 3 1
4 4 3 3
答案 1 :(得分:0)
这是一个能够应对缺失值的基本方法:
confvars <- c("conf.1","conf.2","conf.3")
textvars <- c("text.1","text.2","text.3")
which.sub <- function(x) {
maxsub <- apply(dat[x],1,which.max)
maxsub[(lapply(maxsub,length)==0)] <- NA
return(unlist(maxsub))
}
data.frame(
id = dat$id,
conf = which.sub(confvars),
text = which.sub(textvars)
)
结果:
id conf text
1 1 1 2
2 2 2 1
3 3 3 1
4 4 3 3
5 5 NA NA
答案 2 :(得分:0)
以下解决方案非常简单,我经常使用它。 让我们使用Chase在上面做的相同数据框架。
dat <- data.frame(id = 1:5,
conf.1 = c(1,rep(NA,4)),
conf.2 = c(NA,1, rep(NA,3)),
conf.3 = c(NA,NA,1,1, NA),
text.1 = c(NA,1,1,NA,NA),
text.2 = c(1, rep(NA,4)),
text.3 = c(rep(NA,3),1, NA))
现在我们首先用零替换NA。
dat[is.na(dat)] <- 0
将每列乘以不同的数字,我们可以简单地计算新变量。
dat <- transform(dat, conf=conf.1 + 2*conf.2 + 3*conf.3,
text=text.1 + 2*text.2 + 3*text.3)
让我们将新变量(或整个数据集)中的零重新编码为NA并完成。
dat[dat == 0] <- NA
> dat
id conf.1 conf.2 conf.3 text.1 text.2 text.3 conf text
1 1 1 NA NA NA 1 NA 1 2
2 2 NA 1 NA 1 NA NA 2 1
3 3 NA NA 1 1 NA NA 3 1
4 4 NA NA 1 NA NA 1 3 3
5 5 NA NA NA NA NA NA NA NA