我有一个带有时间顺序索引的四列矩阵和三列名称(字符串)。这是一些玩具数据:
x = rbind(c(1,"sam","harry","joe"), c(2,"joe","sam","jack"),c(3,"jack","joe","jill"),c(4,"harry","jill","joe"))
我想创建三个额外的向量来计算(对于每一行)名称的任何先前(但不是后续)的出现。这将是玩具数据的理想结果:
y = rbind(c(0,0,0),c(1,1,0),c(1,2,0),c(1,1,3))
我无法解决问题,并搜索了Stack Overflow以获取相关示例。 dplyr提供了查找总计数的答案,但(据我所知)不是逐行的。
我试图在单列空间中编写一个处理这个问题的函数,但没有运气,即
thing = sapply(x,function(i)length(grep(i,x[x[1:i]])))
任何提示都将不胜感激。
答案 0 :(得分:4)
这是典型的C:>python
Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit (In
tel)] on win64 Type "help", "copyright", "credits" or "license" for more information.
+ ave
类问题,但我们需要先将数据转换为向量:
seq_along
也许更具可读性:
t(`dim<-`(ave(rep(1, prod(dim(x[, -1]))),
c(t(x[, -1])), FUN = seq_along) - 1,
rev(dim(x[, -1]))))
# [,1] [,2] [,3]
# [1,] 0 0 0
# [2,] 1 1 0
# [3,] 1 2 0
# [4,] 1 1 3
答案 1 :(得分:2)
你可以这样做:
el = unique(c(x[,-1]))
val = Reduce(`+`, lapply(el, function(u) {b=c(t(x[,-1]))==u; b[b==T]=(cumsum(b[b==1])-1); b}))
matrix(val, ncol=ncol(x[,-1]), byrow=T)
# [,1] [,2] [,3]
#[1,] 0 0 0
#[2,] 1 1 0
#[3,] 1 2 0
#[4,] 1 1 3