计算R中多个列的字符串的先前出现次数

时间:2015-05-14 16:33:30

标签: r sapply

我有一个带有时间顺序索引的四列矩阵和三列名称(字符串)。这是一些玩具数据:

x = rbind(c(1,"sam","harry","joe"), c(2,"joe","sam","jack"),c(3,"jack","joe","jill"),c(4,"harry","jill","joe"))

我想创建三个额外的向量来计算(对于每一行)名称的任何先前(但不是后续)的出现。这将是玩具数据的理想结果:

y = rbind(c(0,0,0),c(1,1,0),c(1,2,0),c(1,1,3))

我无法解决问题,并搜索了Stack Overflow以获取相关示例。 dplyr提供了查找总计数的答案,但(据我所知)不是逐行的。

我试图在单列空间中编写一个处理这个问题的函数,但没有运气,即

thing = sapply(x,function(i)length(grep(i,x[x[1:i]])))

任何提示都将不胜感激。

2 个答案:

答案 0 :(得分:4)

这是典型的C:>python Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit (In tel)] on win64 Type "help", "copyright", "credits" or "license" for more information. + ave类问题,但我们需要先将数据转换为向量:

seq_along

也许更具可读性:

t(`dim<-`(ave(rep(1, prod(dim(x[, -1]))), 
              c(t(x[, -1])), FUN = seq_along)  - 1, 
          rev(dim(x[, -1]))))
#      [,1] [,2] [,3]
# [1,]    0    0    0
# [2,]    1    1    0
# [3,]    1    2    0
# [4,]    1    1    3

答案 1 :(得分:2)

你可以这样做:

el = unique(c(x[,-1]))
val = Reduce(`+`, lapply(el, function(u) {b=c(t(x[,-1]))==u; b[b==T]=(cumsum(b[b==1])-1); b}))

matrix(val, ncol=ncol(x[,-1]), byrow=T)
#         [,1] [,2] [,3]
#[1,]    0    0    0
#[2,]    1    1    0
#[3,]    1    2    0
#[4,]    1    1    3