Question

正在使用的数据集位于此Google工作表链接中 https://docs.google.com/spreadsheets/d/1eV33Sgx_UVtk2vDtNBc4Yqs_kQoeffY0oj5gSCq9rCs/edit?usp=sharing

AMC.dataset$ExamMC.A<-surveySP15$Exams_A
AMC.dataset$ExamMC.A<-factor(NA, levels=c("TRUE", "FALSE"))
AMC.dataset$ExamMC.A[AMC.dataset$Exams_A=="1 time"|AMC.dataset$Exams_A=="2-4 times"|AMC.dataset$Exams_A==">4 times"]<-"TRUE"
AMC.dataset$ExamMC.A[AMC.dataset$Exams_A=="0 times"]<-"FALSE"
AMC.dataset$ExamMC.A=as.logical(AMC.dataset$ExamMC.A)

我使用这5行代码将所有9个Exams_A通过Exams_I变量重新编码为＆＃34; True＆＃34;的逻辑二进制结果。对于那些对这9个变量中的任何一个回答了1次或更多次的人。我想将所有这些变量组合到数据集中的新列中，对于每个观察行，如果甚至有一个案例是＆＃34; true＆＃34;对于整行中的9个exams_A到I中的任何一个，新的变量结果将读为＆＃34; true＆＃34;这意味着他们至少曾经在数据集中记录了9种类型的考试学术不端行为。如果在观察行中没有真正的结果，我希望新的变量结果读作＆＃34; false＆＃34;意思是他们（观察行）从未犯过考试学术不端行为

我对这个新变量的代码是

 AMC.dataset$ExamMC = any(AMC.dataset$ExamMC.A, AMC.dataset$ExamMC.B, AMC.dataset$ExamMC.C, AMC.dataset$ExamMC.D, AMC.dataset$ExamMC.E, AMC.dataset$ExamMC.F, AMC.dataset$ExamMC.G, AMC.dataset$ExamMC.H, AMC.dataset$ExamMC.I)

然而，此代码已被字符串中的最后一个变量输出（AMC.dataset $ ExamMC.I）覆盖，其中有215个False个案，0个为true，将覆盖字符串的其余部分以提供新的变量输出215＆＃34; false＆＃34;即使其他变量可能存在＆＃34; True＆＃34;作为他们的案例输出。

修改

我现在已经为一组考试不端行为变量创建了一个数据框

AMC.dataset$ExamMCdf<-data.frame(AMC.dataset$ExamMC.A, AMC.dataset$ExamMC.B, AMC.dataset$ExamMC.C, AMC.dataset$ExamMC.D, AMC.dataset$ExamMC.E, AMC.dataset$ExamMC.F, AMC.dataset$ExamMC.G, AMC.dataset$ExamMC.H, AMC.dataset$ExamMC.I)

现在我的问题是如何在新列中创建一个复合变量，正确读取每个观察行，标记任何一行甚至一个＆＃34; true＆＃34;数据框中的结果为＆＃34; true＆＃34;对于复合变量。任何观察行没有＆＃34;真＆＃34;结果应标记为＆＃34; false＆＃34;由复合变量。

感谢您的所有帮助。

Answer 1

我不能100％确定你所追求的是什么，但在这里我会怎样做我认为你想到的事情：

library(data.table)
setDT(surveySP15)

exams <- paste0("Exams_", LETTERS[1:9])
surveySP15[ , paste0(exams, "_binary") :=
             lapply(.SD, function(x) x %in% c("1 time", "2-4 times", ">4 times")),
           .SDcols = exams]

这将为每个考试创建一个变量（例如，Exams_A_binary），如果它在数据中被编码为1次或更多次，则为logical} TRUE。否则FALSE。这是相关的输出：

> surveySP15[ , paste0(exams, "_binary"), with = FALSE]
     Exams_A_binary Exams_B_binary Exams_C_binary Exams_D_binary Exams_E_binary Exams_F_binary Exams_G_binary
  1:          FALSE          FALSE          FALSE          FALSE          FALSE          FALSE          FALSE
  2:          FALSE          FALSE          FALSE          FALSE          FALSE          FALSE          FALSE
  3:          FALSE          FALSE          FALSE          FALSE          FALSE          FALSE          FALSE
  4:          FALSE          FALSE          FALSE          FALSE          FALSE          FALSE          FALSE
  5:          FALSE          FALSE          FALSE          FALSE          FALSE          FALSE          FALSE
 ---                                                                                                         
223:          FALSE          FALSE          FALSE          FALSE          FALSE          FALSE          FALSE
224:           TRUE           TRUE           TRUE          FALSE           TRUE          FALSE          FALSE
225:          FALSE           TRUE          FALSE          FALSE          FALSE          FALSE          FALSE
226:          FALSE          FALSE          FALSE          FALSE          FALSE          FALSE          FALSE
227:          FALSE          FALSE          FALSE          FALSE          FALSE          FALSE          FALSE
     Exams_H_binary Exams_I_binary
  1:          FALSE          FALSE
  2:          FALSE          FALSE
  3:          FALSE          FALSE
  4:          FALSE          FALSE
  5:          FALSE          FALSE
 ---                              
223:          FALSE          FALSE
224:          FALSE          FALSE
225:          FALSE          FALSE
226:          FALSE          FALSE
227:          FALSE          FALSE

Answer 2

要创建一个复合行来检查其他数据框列中的任何TRUE值，请使用any()中包含的apply()函数逐行进行。我认为你可以将它应用到你的情况中：

#Makes a dataframe with TRUE/FALSE values and a low chance for TRUE
set.seed(123)
data <- data.frame(
  Exams_A = sample(c(TRUE,FALSE), 10, TRUE, c(.1, .9)),
  Exams_B = sample(c(TRUE,FALSE), 10, TRUE, c(.1, .9)),
  Exams_C = sample(c(TRUE,FALSE), 10, TRUE, c(.1, .9)),
  Exams_D = sample(c(TRUE,FALSE), 10, TRUE, c(.1, .9)),
  Exams_E = rep(TRUE,10) # Inserts row of all TRUE's to show that you can limit scope
)

data$ExamMC <- apply(data[, 1:4], 1, function(x) any(x))
data$ExamMC <- apply(data[, 1:4], 1, any) # This is the updated version
                          # ^ This part sets what columns you want to search

将具有相同值的列变量组合到一个新变量中

2 个答案: