我正在努力使用lapply
来重新编码值。
假设我有10个调查问题,每个问题有4个答案,其中始终有一个正确或错误的答案。问题标记为q_1
到q_10
,我的数据框称为df
。我想创建具有相同顺序标签的新变量,只需将问题编码为“正确”(1)或“错误”(0)。
如果我要列出正确的答案,那就是:
right_answers<-c(1,2,3,4,2,3,4,1,2,4)
然后,我正在尝试编写一个函数,只需将所有变量重新编码为新变量,同时使用相同的顺序标识符,例如
lapply(1:10, function(fx) {
df$know_[fx]<-ifelse(df$q_[fx]==right_answers[fx],1,0)
})
在一个假设的宇宙中,这段代码是远程正确的,我会得到以下结果:
id q_1 know_1 q_2 know_2
1 1 1 2 1
2 4 0 3 0
3 3 0 2 1
4 4 0 1 0
非常感谢你的帮助!
答案 0 :(得分:1)
对于与其他答案相同的矩阵输出,我建议:
q_names <- paste0("q_", seq_along(right_answers))
answers <- df[q_names]
correct <- mapply(`==`, answers, right_answers)
答案 1 :(得分:0)
这应该为您提供每个答案是否正确的矩阵:
t(apply(test[,grep("q_", names(test))], 1, function(X) X==right_answers))
答案 2 :(得分:0)
您可能无法使用此部分代码df$q_[fx]
。您可以使用paste
调用列名称。如:
df = read.table(text = "
id q_1 q_2
1 1 2
2 4 3
3 3 2
4 4 1", header = TRUE)
right_answers = c(1,2,3,4,2,3,4,1,2,4)
dat2 = sapply(1:2, function(fx) {
ifelse(df[paste("q",fx,sep = "_")]==right_answers[fx],
1,0)
})
这不会为您的data.frame添加列,而是像@ SenorO的答案那样创建一个新矩阵。您可以在矩阵中命名列,然后将它们添加到原始data.frame中,如下所示。
colnames(dat2) = paste("know", 1:2, sep = "_")
data.frame(df, dat2)
答案 3 :(得分:0)
我想使用reshape2包为您的问题提出不同的解决方法。在我看来,这有以下优点:1)更惯用的R(代表什么值得),2)更易读的代码,3)更不容易出错,特别是如果你想在将来添加分析。在这种方法中,一切都是在数据框架内完成的,我认为这是可取的 - 更容易保留单个记录的所有值(在这种情况下为id),并且更容易使用R工具的强大功能。
# Creating a dataframe with the form you describe
df <- data.frame(id=c('1','2','3','4'), q_1 = c(1,4,3,4), q_2 = c(2,3,2,1), q_3 = rep(1, 4), q_4 = rep(2, 4), q_5 = rep(3, 4),
q_6 = rep(4,4), q_7 = c(1,4,3,4), q_8 = c(2,3,2,1), q_9 = rep(1, 4), q_10 = rep(2, 4))
right_answers<-c(1,2,3,4,2,3,4,1,2,4)
# Associating the right answers explicitly with the corresponding question labels in a data frame
answer_df <- data.frame(questions=paste('q', 1:10, sep='_'), right_answers)
library(reshape2)
# "Melting" the dataframe from "wide" to "long" form -- now questions labels are in variable values rather than in column names
melt_df <- melt(df) # melt function is from reshape2 package
# Now merging the correct answers into the data frame containing the observed answers
merge_df <- merge(melt_df, answer_df, by.x='variable', by.y='questions')
# At this point comparing the observed to correct answers is trivial (using as.numeric to convert from logical to 0/1 as you request, though keeping as TRUE/FALSE may be clearer)
merge_df$correct <- as.numeric(merge_df$value==merge_df$right_answers)
# If desireable (not sure it is), put back into "wide" dataframe form
cast_obs_df <- dcast(merge_df, id ~ variable, value.var='value') # dcast function is from reshape2 package
cast_cor_df <- dcast(merge_df, id ~ variable, value.var='correct')
names(cast_cor_df) <- gsub('q_', 'know_', names(cast_cor_df))
final_df <- merge(cast_obs_df, cast_cor_df)
新的tidyr包可能比reshape2更好。