我创建了一个计算dPrime的函数。该函数将数据框作为参数。这样可以正常工作,但是必须将列称为“stimDiff”和“stimSame”,因为该函数使用这些特定名称计算dPrime。我想将此函数应用于具有多个主题的数据框,并且能够为每个主题计算dPrime,结果是具有每个主题的dPrime分数的新数据框。测试数据框如下所示:
stimDiff0 <- c(rep("diff", 20), rep("same", 5))
stimSame0 <- c(rep("diff", 10), rep("same", 15))
stimDiff1 <- c(rep("diff", 10), rep("same", 15))
stimSame1 <- c(rep("diff", 10), rep("same", 15))
stimDiff2 <- c(rep("diff", 19), rep("same", 6))
stimSame2 <- c(rep("diff", 11), rep("same", 14))
stimDiff3 <- c(rep("diff", 21), rep("same", 4))
stimSame3 <- c(rep("diff", 9), rep("same", 16))
stimDiff4 <- c(rep("diff", 18), rep("same", 7))
stimSame4 <- c(rep("diff", 12), rep("same", 13))
stimDiff5 <- c(rep("diff", 22), rep("same", 3))
stimSame5 <- c(rep("diff", 14), rep("same", 11))
stimDiff <- c(stimDiff0, stimDiff1, stimDiff2,
stimDiff3, stimDiff4, stimDiff5)
stimSame <- c(stimSame0, stimSame1, stimSame2,
stimSame3, stimSame4, stimSame5)
subject <- rep(0:5, each = 25)
x <- data.frame(subject = subject, stimDiff = stimDiff, stimSame = stimSame)
我正在尝试使用以下代码按主题数据框获取dPrim:
tapply(c(x$stimDiff, x$stimSame), x$subject, data = x, FUN = dPrime)
我收到以下错误:
Error en tapply(list(x$stimDiff, x$stimSame), x$subject, data = x, FUN = dPrime) :
arguments must have same length
我知道有些包可以计算dPrime。我这样做是为了学习如何编写函数。我更愿意找到使用基础R的解决方案。
以下是函数dPrime的代码:
dPrime <- function(x) {
# Calculate number of same, diff and total responses
# for the stimuli that were actually different
stimDiffRdiff <- nrow(x[x$stimDiff == 'diff', ])
stimDiffRsame <- nrow(x[x$stimDiff == 'same', ])
stimDiffTotal <- length(x$stimDiff)
# Calculate number of same, diff and total responses
# for the stimuli that were actually the same
stimSameRdiff <- nrow(x[x$stimSame == 'diff', ])
stimSameRsame <- nrow(x[x$stimSame == 'same', ])
stimSameTotal <- length(x$stimSame)
# Hit rate = the number of correct responses 'diff'
# when the stimuli were actually diff, divided by
# the total number of responses
hitRate <- stimDiffRdiff / stimDiffTotal
# Miss rate = the number of incorrect responses
# 'same' when the stimuli were actually diff
# divided by the total number of responses
missRate <- stimDiffRsame / stimDiffTotal
# False alarm = the number responses 'diff'
# when the stimuli were actually the same
# divided by the total number of responses
falseAlarm <- stimSameRdiff / stimSameTotal
# Correct rejection = the number of responses
# same when the stimuli were actually the same
# divided by the number of total responses
corrReject <- stimSameRsame / stimSameTotal
# Calculate z-critical values for hit rate
# and false alarm rate
zHitRate <- qnorm(hitRate)
zFalseAlarm <- qnorm(falseAlarm)
# Calculate d prime
dPrime <- zHitRate - zFalseAlarm
print(dPrime)
}
答案 0 :(得分:2)
在@jvcasill的原始功能和其他用户的基础上构建&#39;响应:
dPrime <- function (data, subj = 1, stimDiff = 2, stimSame = 3) {
# dPrime() returns a vector of the length of the number of subjects
#+ in data[, subj] that contains the sensitivity index "d'" for each.
# `data`: data frame
# `subj`: index of "subject" column in `data`; default is 1
# `stimDiff`: index of "stimDiff" column in `data`; default is 2
# `stimSame`: index of "stimSame" column in `data`; default is 3
if (is.data.frame(data)) {
# Divide `data` by subject with split(), as have done others who've
#+ responded to this question
data.by.subj <- split(data, data[, subj])
# Calculate number of subjects and create vector of same length
#+ to return
n.subj <- length(data.by.subj)
dPrime.by.subj <- vector(mode = "double", length = n.subj)
# Loop through "data.by.subj" subject by subject and calculate d'
for (subj in seq_len(n.subj)) {
# For clarity, create temporary data set with data of
#+ current "subj"
data.tmp <- data.by.subj[[subj]]
stimDiffRdiff <- nrow(data.tmp[data.tmp[, stimDiff] == "diff", ])
stimDiffRsame <- nrow(data.tmp[data.tmp[, stimDiff] == "same", ])
stimDiffTotal <- length(data.tmp[, stimDiff])
stimSameRdiff <- nrow(data.tmp[data.tmp[, stimSame] == "diff", ])
stimSameRsame <- nrow(data.tmp[data.tmp[, stimSame] == "same", ])
stimSameTotal <- length(data.tmp[, stimSame])
hitRate <- stimDiffRdiff / stimDiffTotal
missRate <- stimDiffRsame / stimDiffTotal
falseAlarm <- stimSameRdiff / stimSameTotal
# The following appears unused in the original function
# corrReject <- stimSameRsame / stimSameTotal
zHitRate <- qnorm(hitRate)
zFalseAlarm <- qnorm(falseAlarm)
dPrime <- zHitRate - zFalseAlarm
dPrime.by.subj[subj] <- dPrime
}
# For clarity, give each d' value in vector to be returned,
#+ "dPrime.by.subj", name of corresponding subject
names(dPrime.by.subj) <- names(data.by.subj)
return(dPrime.by.subj)
} else stop("'data' is not a data frame")
}
请注意,我不确定它返回的值,对于@jvcasill提供的示例数据集---是否与使用@ Splendor方法获得的值相同,make感。
答案 1 :(得分:1)
尝试data.table(使用长度函数而不是dPrime):
library(data.table)
xt = data.table(x)
xt[,list(len=length(c(stimSame,stimDiff))),by=subject]
subject len
1: 0 50
2: 1 50
3: 2 50
4: 3 50
5: 4 50
6: 5 50
以基地R:
sapply(split(x, x$subject), dPrime)
[1] 1.094968
[1] 0
[1] 0.8572718
[1] 1.352917
[1] 0.6329951
[1] 1.024018
0 1 2 3 4 5
1.0949683 0.0000000 0.8572718 1.3529167 0.6329951 1.0240176
重复输出是因为dPrime函数中的'print(dPrime)'语句。您应该通过return(dPrime)替换它。更好的是,既然dPrime也是一个函数,你应该将'dPrime&lt; - zHitRate - zFalseAlarm'语句中的dPrime替换为其他名称,比如'ret':
ret = dPrime <- zHitRate - zFalseAlarm
return(ret)
答案 2 :(得分:1)
这是基础R中的一个(有点不优雅)解决方案:
将数据框拆分为列表,每个主题一个:
by.subject <- split(x, x$subject)
计算每个块的dPrime,返回一个命名的数字向量:
dPrime.values <- unlist(lapply(by.subject, dPrime), use.names=T)
构建新的数据框:
df <- data.frame(dPrime=dPrime.values)
df$subject <- as.numeric(rownames(df))