我正在尝试基于正确的响应(列EQ_R
和MEM_R
)来计算参与者的响应(列EQ_C
和MEM_C
)的准确性。
dput(example)
structure(list(TRIAL = c("1", "2", "3", "4", "5", "6", "7", "8",
"9", "10", "11", "12", "13", "14", "15"), EQ_C = c("0101", "1010",
"1010", "00111", "01011", "01101", "100011", "010101", "001101",
"0110011", "1101001", "1100101", "11100001", "11001010", "11001010"
), EQ_R = c("0101", "0010", "1010", "00111", "01011", "01101",
"10101", "11010", "001101", "0100011", "1101001", "0100101",
"11110001", "11001010", "11001010"), MEM_C = c("ZLHK", "RZKX",
"DGWL", "BCJSP", "WRKTJ", "CHBXS", "HNDCWX", "SWVNDT", "WLDGPB",
"DSHRKBV", "HCXLZWB", "HDNBVZC", "BCRHKVDM", "RVTBWKFS", "NWHVZFLD"
), MEM_R = c("ZLHK", "RZKX", "DGWL", "BCJSP", "WRKLTJ", "CHBXS",
"HNDCWX", "SWVDTN", "WLDGPB", "DSHRKBV", "HCXLZWB", "HDNBVZC",
"BCRHKVDM", "RVTBWKFS", "NWHVZFLD"), EQ_SUM = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), MEM_SUM = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names
= c(NA,
15L), class = "data.frame")
我为需要计算二进制数据(EQ)和字母(MEM)的“总和” /准确性分数添加了新列。
OSPAN["EQ_SUM"] <- NA
OSPAN["MEM_SUM"]<- NA
然后我尝试使用strsplit来计算精度,但是我收到错误通知。
mean(strsplit(OSPAN$MEM_C, "") == strsplit(OSPAN$MEM_R, ""))
Error in strsplit(OSPAN$MEM_C, "") == strsplit(OSPAN$MEM_R, "") : comparison of these types is not implemented
In addition:
Warning messages:
1: In strsplit(OSPAN$MEM_R, "") : input string 342 is invalid UTF-8
2: In strsplit(OSPAN$MEM_R, "") : input string 580 is invalid UTF-8
我的问题是:
我如何匹配/计算预测值(C)和实际(R)值之间的准确性或一致性?
例如,在第1行中,EQ_SUM
将为1(或100%),而在第2行中,{{1}}将为0.75或75%,因为参与者选择了错误的答案(0代替1) 。因此,给出了部分信用分数,这不是绝对匹配/一致的问题。
谢谢。
答案 0 :(得分:2)
一种可能是使用RecordLinkage
库:
with(df, levenshteinSim(EQ_C, EQ_R))
[1] 1.0000000 0.7500000 1.0000000 1.0000000 1.0000000 1.0000000 0.6666667 0.6666667
[9] 1.0000000 0.8571429 1.0000000 0.8571429 0.8750000 1.0000000 1.0000000
它使用Levenshtein距离计算两个字符串之间的相似度。
答案 1 :(得分:2)
我敢肯定有一种最有效的方法,但是,您可以按列表比较列表并将其添加到数据框中。
for (i in 1:nrow(OSPAN)){
OSPAN$EQ_SUM[i] <- sum(strsplit(OSPAN$EQ_C, "", useBytes = TRUE)[[i]] == strsplit(OSPAN$EQ_R, "", useBytes = TRUE)[[i]])/length(strsplit(OSPAN$EQ_C, "")[[i]])
OSPAN$MEM_SUM[i] <- sum(strsplit(OSPAN$MEM_C, "", useBytes = TRUE)[[i]] == strsplit(OSPAN$MEM_R, "", useBytes = TRUE)[[i]])/length(strsplit(OSPAN$MEM_C, "")[[i]])
}
另一方面,有些案件的长度不同,我们该如何处理?