Question

我有调查数据。它来自一个看起来像这样的问题：

Did you do any of the following activities during your PhD

                             Yes, paid by my school. Yes, paid by me.  No. 

Attended an internationl conference?
Bought textbooks?

数据会以这种方式自动保存在电子表格中：

id conf.1 conf.2 conf.3 text.1 text.2 text.3

1    1                              1
2           1               1
3                   1       1
4                   1                    1
5

这意味着参与者1参加了她大学支付的会议;参加者2参加了会议，参与者3没有参加。

我想在单个变量中合并conf.1，conf.2和conf.3以及text.1，text.2和text.3

id new.conf new.text

1   1        2
2   2        1
3   3        1
4   3        3

where the number now respresents the categories of the survey question

Thanks for your help

Answer 1

您没有说明每组问题是否都有多个答案。如果是这样，这种方法可能对您不起作用。如果是这种情况，我建议您在继续前进之前再提问reproducible。有了这个警告，请给它一个旋转：

library(reshape2)
#recreate your data
dat <- data.frame(id = 1:5,
                  conf.1 = c(1,rep(NA,4)),
                  conf.2 = c(NA,1, rep(NA,3)),
                  conf.3 = c(NA,NA,1,1, NA),
                  text.1 = c(NA,1,1,NA,NA),
                  text.2 = c(1, rep(NA,4)),
                  text.3 = c(rep(NA,3),1, NA))

#melt into long format
dat.m <- melt(dat, id.vars = "id")
#Split on the "."
dat.m[, c("variable", "val")] <- with(dat.m, colsplit(variable, "\\.", c("variable", "val")))
#Subset out only the complete cases
dat.m <- dat.m[complete.cases(dat.m),]
#Cast back into wide format
dcast(id ~ variable, value.var = "val", data = dat.m)
#-----
  id conf text
1  1    1    2
2  2    2    1
3  3    3    1
4  4    3    3

Answer 2

这是一个能够应对缺失值的基本方法：

confvars <- c("conf.1","conf.2","conf.3")
textvars <- c("text.1","text.2","text.3")

which.sub <- function(x) {
maxsub <- apply(dat[x],1,which.max)
maxsub[(lapply(maxsub,length)==0)] <- NA
return(unlist(maxsub))
}

data.frame(
id = dat$id,
conf = which.sub(confvars),
text = which.sub(textvars)
)

结果：

  id conf text
1  1    1    2
2  2    2    1
3  3    3    1
4  4    3    3
5  5   NA   NA

Answer 3

以下解决方案非常简单，我经常使用它。让我们使用Chase在上面做的相同数据框架。

dat <- data.frame(id = 1:5,
                  conf.1 = c(1,rep(NA,4)),
                  conf.2 = c(NA,1, rep(NA,3)),
                  conf.3 = c(NA,NA,1,1, NA),
                  text.1 = c(NA,1,1,NA,NA),
                  text.2 = c(1, rep(NA,4)),
                  text.3 = c(rep(NA,3),1, NA))

现在我们首先用零替换NA。

dat[is.na(dat)] <- 0

将每列乘以不同的数字，我们可以简单地计算新变量。

dat <- transform(dat, conf=conf.1 + 2*conf.2 + 3*conf.3,
                      text=text.1 + 2*text.2 + 3*text.3)

让我们将新变量（或整个数据集）中的零重新编码为NA并完成。

dat[dat == 0] <- NA 

> dat
  id conf.1 conf.2 conf.3 text.1 text.2 text.3 conf text
1  1      1     NA     NA     NA      1     NA    1    2
2  2     NA      1     NA      1     NA     NA    2    1
3  3     NA     NA      1      1     NA     NA    3    1
4  4     NA     NA      1     NA     NA      1    3    3
5  5     NA     NA     NA     NA     NA     NA   NA   NA

如何合并几个变量在R中创建一个新的因子变量？

3 个答案: