如何在R中逐个元素地比较两个数据集?

时间:2016-12-08 06:14:05

标签: r dataframe compare elements

我需要检查50名不同学生的测试结果,以及A,B,C,D多项选择方式的答案键。

我有一个答案键的一维数据集,我读到的“答案” answers <- read.table("A1_Ans_only.txt", header = FALSE, sep = ",")

View(answers)

我的数据集“结果”包含所有50名学生的所有答案。我在results <- read.csv("Form A1_only.csv", header = FALSE)

中读到了它

View(results)

所以,当我尝试像results==answers或`evaluate(results,answers)'这样的函数时,我将编写的函数定义为'evaluate&lt; - function(x,y){x == y}'当我将每个数据子集化为一维时,我会得到各种错误,例如“非等长数据帧”或不同的水平向量。

有人可以帮我评估结果数据框的每个元素,以确定每个学生哪些问题得到了解答?

This is a small sample of results: 


structure(list(V1 = c(1L, 3L, 5L), V2 = c(NA, NA, NA), V3 = structure(c(2L, 
1L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"), V4 =     structure(c(1L, 
1L, 1L), .Label = c("A", "B", "C", "D"), class = "factor"), V5 = structure(c(2L, 
2L, 3L), .Label = c("A", "B", "C", "D"), class = "factor"), V6 = structure(c(1L, 
1L, 1L), .Label = c("A", "B", "C"), class = "factor"), V7 = structure(c(1L, 
1L, 1L), .Label = c("A", "C", "D"), class = "factor"), V8 = structure(c(2L, 
1L, 2L), .Label = c("A", "B", "D"), class = "factor"), V9 = structure(c(1L, 
1L, 1L), .Label = c("A", "C", "D"), class = "factor"), V10 = structure(c(2L, 
2L, 1L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("V1", 
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10"), row.names = c(NA, 
3L), class = "data.frame")


This is the sample from answers: 

structure(list(V1 = structure(1L, .Label = "AAAAKEY", class = "factor"), 
V2 = NA, V3 = structure(1L, .Label = "C", class = "factor"), 
V4 = structure(1L, .Label = "A", class = "factor"), V5 = structure(1L, .Label = "C", class = "factor"), 
V6 = structure(1L, .Label = "A", class = "factor"), V7 = structure(1L, .Label = "A", class = "factor"), 
V8 = structure(1L, .Label = "B", class = "factor"), V9 = structure(1L, .Label = "A", class = "factor"), 
V10 = structure(1L, .Label = "B", class = "factor")), .Names = c("V1", 
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10"), class = "data.frame", row.names = c(NA, 
-1L))

1 个答案:

答案 0 :(得分:1)

我们可以在复制'答案'后进行比较以使长度相等

results==answers[col(results)]
#     V1 V2    V3   V4    V5   V6   V7    V8   V9   V10
#1 FALSE NA FALSE TRUE FALSE TRUE TRUE  TRUE TRUE  TRUE
#2 FALSE NA FALSE TRUE FALSE TRUE TRUE FALSE TRUE  TRUE
#3 FALSE NA FALSE TRUE  TRUE TRUE TRUE  TRUE TRUE FALSE

“答案”列V2中的NA导致NA输出,因为任何与NA的相等比较都会导致NA。如果我们需要它为FALSE,那么之后将NA更改为FALSE或使用&执行!is.na(answers)[col(results)]