I have a vector made of 0 and non-zero numbers. I would like to know the length and starting-position of each of the non-zero number series:
a = c(0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 2.6301334 1.8372030 0.0000000 0.0000000 0.0000000 1.5632647 1.1433757 0.0000000 1.5412216 0.8762267 0.0000000 1.3087967 0.0000000 0.0000000 0.0000000)
based on a previous post it is easy to find the starting positions of the non-zero regions: Finding the index of first changes in the elements of a vector in R
c(1,1+which(diff(a)!=0))
However I cannot seem to configure a way of finding the length of these regions....
I have tried the following:
dif=diff(which(a==0))
dif_corrected=dif-1 # to correct for the added lengths
row=rbind(postion=seq(length(a)), length=c(1, dif_corrected))
position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
length 1 0 0 0 0 2 0 0 2 2 1 0 0 1 0
NOTE: not all columns are displayed ( there are actually 20)
Then I subset this to take away 0 values:
> row[,-which(row[2,]==0)]
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
position 1 6 9 10 11 14 19
length 1 2 2 2 1 1 2
This seems like a decent way of coming up with the positions and lengths of each non-zero series in the series, but it is incorrect:
The position 9 (identified as the start of a non-zero series) is a 0 and instead 10 and 11 are non-zero so I would expect the position 10 and a length of 2 to appear here.... The only result that is correct is position 6 which is the start of the first non-zero series- it is correctly identified as having a length of 2- all other positions are incorrect.
Can anyone tell me how to index correctly to identify the starting-position of each of the non-zero series and the corresponding lengths?
NOTE I only did this in R because of the usefulness of the which command but it would also be good to know how to do this numpy and create a dictionary of positions and length values
答案 0 :(得分:1)
似乎rle
在这里很有用。
# a slightly simpler vector
a <- c(0, 0, 1, 2, 0, 2, 1, 2, 0, 0, 0, 1)
# runs of zero and non-zero elements
r <- rle(a != 0)
# lengths of non-zero elements
r$lengths[r$values]
# [1] 2 3 1
# start of non-zero runs
cumsum(r$lengths)[r$values] - r$lengths[r$values] + 1
# [1] 3 6 12
这也适用于仅包含0
或非0
的向量,并且不依赖于向量是以0
还是以非{{1}开头/结尾}。 E.g:
0
可能a <- c(1, 1)
a <- c(0, 0)
a <- c(1, 1, 0, 1, 1)
a <- c(0, 0, 1, 1, 0, 0)
替代方案,使用data.table
创建群组,rleid
获取起始索引并计算长度。
.I
如果需要,可以通过“非零”轻松切割运行。列。
答案 1 :(得分:1)
对于numpy
,这是@Maple的并行方法(对以非零结尾的数组进行修复):
def subSeries(a):
d = np.logical_not(np.isclose(a, np.zeros_like(a))).astype(int)
starts = np.where(np.diff(np.r_[0, d, 0]) == 1))
ends = np.where(np.diff(np.r_[0, d, 0]) == -1))
return np.c_[starts - 1, ends - starts]
答案 2 :(得分:0)
<强>定义强>:
sublistLen = function(list) {
z_list <- c(0, list, 0)
ids_start <- which(diff(z_list != 0) == 1)
ids_end <- which(diff(z_list != 0) == - 1)
lengths <- ids_end - ids_start
return(
list(
'ids_start' = ids_start,
'ids_end' = ids_end - 1,
'lengths' = lengths)
)
}
示例强>:
> a <- c(-2,0,0,12,5,0,124,0,0,0,0,4,48,24,12,2,0,9,1)
> sublistLen(a)
$ids_start
[1] 1 4 7 12 18
$ids_end
[1] 1 5 7 16 19
$lengths
[1] 1 2 1 5 2