去除" 0"序列之间的正值序列。

时间:2014-08-19 12:45:47

标签: r

我想在数据帧中创建一个小函数,用于检测(并设置为0)位于值为0的序列之间的正值序列,但前提是这些正值序列不是更多超过5个值。

这里只是一个小例子,向您展示我的数据外观(initial_data列),以及我想在最后获得的内容(final_data列):

DF<-data.frame(initial_data=c(0,0,0,0,100,2,85,0,0,0,0,0,0,3,455,24,10,7,6,15,42,0,0,0,0,0,0,0),final_data=c(0,0,0,0,0,0,0,0,0,0,0,0,0,3,455,24,10,7,6,15,42,0,0,0,0,0,0,0))

这句话也可以恢复诀窍: &#34;如果有一个正值序列,不超过5个值,并且位于至少两个或三个0值之间(在此正序值序列之前和之后),则将此序列也设置为0& #34;

有关这方面的建议吗?

非常感谢!!!

2 个答案:

答案 0 :(得分:2)

以下是使用rle函数的可能方法:

DF<-data.frame(initial_data=c(0,0,0,0,100,2,85,0,0,0,0,0,0,3,455,24,10,7,6,15,42,0,0,0,0,0,0,0),
               final_data=c(0,0,0,0,0,0,0,0,0,0,0,0,0,3,455,24,10,7,6,15,42,0,0,0,0,0,0,0))

# using rle create an object with the sequences of consecutive elements 
# having the same sign (-1 means negative, 0 means zero, 1 means positive)
enc <- rle(sign(DF$initial_data))

# find the positive sequences having maximum 5 elements
posSequences <- which(enc$values == 1 & enc$lengths <= 5)

# remove index=1 or index=length(enc$values) if present because 
# they can't be surrounded by 0
posSequences <- posSequences[posSequences != 1 & 
                             posSequences != length(enc$values)]

# check if they're preceeded and followed by at least 2 zeros 
# (if not remove the index)
toForceToZero <- sapply(posSequences,FUN=function(idx){
                                           enc$values[idx-1]==0 &&
                                           enc$lengths[idx-1] >= 2 && 
                                           enc$values[idx+1] == 0 &&
                                           enc$lengths[idx+1] >= 2})
posSequences <- posSequences[toForceToZero]

# reverse the run-length encoding, setting NA where we want to force to zero
v <- enc$values
v[posSequences] <- NA

# create the final data vector by forcing NAs to 0  
final_data <- DF$initial_data
final_data[is.na(rep.int(v, enc$lengths))] <- 0

# check if is equal to your desired output
all(DF$final_data == final_data)

# > [1] TRUE

答案 1 :(得分:1)

我最好的朋友rle来救援:

notzero<-rle(as.logical(unlist(DF)))
Run Length Encoding
  lengths: int [1:7] 4 3 6 8 20 8 7
  values : logi [1:7] FALSE TRUE FALSE TRUE FALSE TRUE ...

现在,只需查找valuesTRUElengths的所有位置&lt; 5,并使用values替换这些位置的FALSE。然后调用inverse.rle以获得所需的输出。