找出一列中的序列与另一列中的序列相同的百分比

时间:2017-08-02 16:44:59

标签: r

我希望我能恰当地说明这一点。我有一个数据集,有两列我想在内存实验中进行比较。 Recall.CRESP是一个列,用于指定通过网格坐标选择的内存测试的正确答案。 Recall.RESP显示参与者的回应。

列看起来像这样:

|Recall.CRESP                     | Recall.RESP                     |
|---------------------------------|---------------------------------|                 
|grid35grid51grid12grid43grid54   | grid35grid51grid12grid43grid54  |                
|grid22grid53grid35grid21grid44   | grid23grid53grid35grid21grid43  |
|grid12grid14grid15grid41grid23   | grid12grid24grid31grid41grid25  |
|grid15grid41grid33grid24grid55   | grid15grid41grid33grid14grid55  |          

我有以下代码行告诉我每列的相互相同的百分比:

paste0((100*with(Data, mean(Recall.CRESP==Recall.RESP, na.rm = "TRUE"))), "%")

例如,在我的数据集中,20%的时间列Recall.CRESP完全匹配Recall.RESP,表示主体在20%的时间内在内存测试中得分为5分(满分5分)。

但是我希望能够以两种方式扩展这一点。第一个不是给我一个百分比,当行是相同的时,我想要一个百分比,当序列中有一个部分匹配。例如,grid11gird42gird22grid51grid32grid11gird15gird55grid42grid32共享2/5的匹配,第一个和最后一个网格坐标相同。我不知道如何在R中为2/5的部分序列匹配(或5中的任何其他结果)指定请求。另请注意,在此示例中,grid42会在两个序列中显示,但如果在Recall.RESP中记住位置,则无法正确调用。顺序在这些序列中很重要。

另一点是,到目前为止,我已经在检查记忆项目前向召回的准确性方面描述了实验。然而,我也有单独的数据,参与者以倒退的顺序回忆起来。例如,来自grid11gird22gird33grid44grid55的{​​{1}}和来自Recall.CRESP的{​​{1}}正确匹配4/5次。如何转换代码以检查反向序列并计算5个百分比?

任何想法都将不胜感激。

2 个答案:

答案 0 :(得分:1)

这是我的解决方案:

Recall.CRESP <- c('grid35grid51grid12grid43grid54',
                  'grid22grid53grid35grid21grid44',
                  'grid12grid14grid15grid41grid23',
                  'grid15grid41grid33grid24grid55')

Recall.RESP <- c('grid35grid51grid12grid43grid54',
                 'grid23grid53grid35grid21grid43',
                 'grid12grid24grid31grid41grid25',
                 'grid15grid41grid33grid14grid55')

df <- data.frame(Recall.CRESP, Recall.RESP, stringsAsFactors = F)
df$correctNormal <- NA
df$correctReverse <- NA

for (row in 1:nrow(df)) {
  crespVector <- unlist(strsplit(as.character(df[row, 1]), 'grid'))[-1]
  respVector <- unlist(strsplit(as.character(df[row, 2]), 'grid'))[-1]
  correctNormal <- 0
  correctReverse <- 0
  for (i in 1:length(crespVector)) {
    if (crespVector[i] == respVector[i]) correctNormal <- correctNormal + 1
    if (crespVector[i] == respVector[length(respVector) + 1 - i]) correctReverse <- correctReverse + 1
  }
  df$correctNormal[row] = correctNormal / 5
  df$correctReverse[row] = correctReverse / 5
}

df

##                     Recall.CRESP                    Recall.RESP correctNormal correctReverse
## 1 grid35grid51grid12grid43grid54 grid35grid51grid12grid43grid54           1.0            0.2
## 2 grid22grid53grid35grid21grid44 grid23grid53grid35grid21grid43           0.6            0.2
## 3 grid12grid14grid15grid41grid23 grid12grid24grid31grid41grid25           0.4            0.0
## 4 grid15grid41grid33grid24grid55 grid15grid41grid33grid14grid55           0.8            0.2

答案 1 :(得分:1)

我会将字符串分成矩阵列,这样可以很容易地进行比较和操作:

# borrowing Oriol's nicely shared data
Recall.CRESP <- c('grid35grid51grid12grid43grid54',
                  'grid22grid53grid35grid21grid44',
                  'grid12grid14grid15grid41grid23',
                  'grid15grid41grid33grid24grid55')

Recall.RESP <- c('grid35grid51grid12grid43grid54',
                 'grid23grid53grid35grid21grid43',
                 'grid12grid24grid31grid41grid25',
                 'grid15grid41grid33grid14grid55')

# function to create matrices
matrixify = function(dat) {
    dat = do.call(rbind, strsplit(dat, split = "grid"))
    dat = dat[, -1]
    mode(dat) = "numeric"
    return(dat)
}

cresp_mat = matrixify(Recall.CRESP)
resp_mat = matrixify(Recall.RESP)

## an example of what we made: just the numbers in the right order
cresp_mat
#      [,1] [,2] [,3] [,4] [,5]
# [1,]   35   51   12   43   54
# [2,]   22   53   35   21   44
# [3,]   12   14   15   41   23
# [4,]   15   41   33   24   55

## Calculating results is now easy:
(forwards = rowMeans(cresp_mat == resp_mat))
# [1] 1.0 0.6 0.4 0.8

(reverse = rowMeans(cresp_mat == resp_mat[, 5:1]))
# [1] 0.2 0.2 0.0 0.2

当然,您可以将结果指定为原始数据的新列。