我想在数据帧中创建一个小函数,用于检测(并设置为0)位于值为0的序列之间的正值序列,但前提是这些正值序列不是更多超过5个值。
这里只是一个小例子,向您展示我的数据外观(initial_data列),以及我想在最后获得的内容(final_data列):
DF<-data.frame(initial_data=c(0,0,0,0,100,2,85,0,0,0,0,0,0,3,455,24,10,7,6,15,42,0,0,0,0,0,0,0),final_data=c(0,0,0,0,0,0,0,0,0,0,0,0,0,3,455,24,10,7,6,15,42,0,0,0,0,0,0,0))
这句话也可以恢复诀窍: &#34;如果有一个正值序列,不超过5个值,并且位于至少两个或三个0值之间(在此正序值序列之前和之后),则将此序列也设置为0& #34;
有关这方面的建议吗?
非常感谢!!!
答案 0 :(得分:2)
以下是使用rle函数的可能方法:
DF<-data.frame(initial_data=c(0,0,0,0,100,2,85,0,0,0,0,0,0,3,455,24,10,7,6,15,42,0,0,0,0,0,0,0),
final_data=c(0,0,0,0,0,0,0,0,0,0,0,0,0,3,455,24,10,7,6,15,42,0,0,0,0,0,0,0))
# using rle create an object with the sequences of consecutive elements
# having the same sign (-1 means negative, 0 means zero, 1 means positive)
enc <- rle(sign(DF$initial_data))
# find the positive sequences having maximum 5 elements
posSequences <- which(enc$values == 1 & enc$lengths <= 5)
# remove index=1 or index=length(enc$values) if present because
# they can't be surrounded by 0
posSequences <- posSequences[posSequences != 1 &
posSequences != length(enc$values)]
# check if they're preceeded and followed by at least 2 zeros
# (if not remove the index)
toForceToZero <- sapply(posSequences,FUN=function(idx){
enc$values[idx-1]==0 &&
enc$lengths[idx-1] >= 2 &&
enc$values[idx+1] == 0 &&
enc$lengths[idx+1] >= 2})
posSequences <- posSequences[toForceToZero]
# reverse the run-length encoding, setting NA where we want to force to zero
v <- enc$values
v[posSequences] <- NA
# create the final data vector by forcing NAs to 0
final_data <- DF$initial_data
final_data[is.na(rep.int(v, enc$lengths))] <- 0
# check if is equal to your desired output
all(DF$final_data == final_data)
# > [1] TRUE
答案 1 :(得分:1)
我最好的朋友rle
来救援:
notzero<-rle(as.logical(unlist(DF)))
Run Length Encoding
lengths: int [1:7] 4 3 6 8 20 8 7
values : logi [1:7] FALSE TRUE FALSE TRUE FALSE TRUE ...
现在,只需查找values
为TRUE
和lengths
的所有位置&lt; 5,并使用values
替换这些位置的FALSE
。然后调用inverse.rle
以获得所需的输出。