如何从数据框中提取特定行

时间:2019-06-18 12:32:01

标签: r

假设我有足够大的数据帧,大约有一百万行

我想删除数据框中BSM和ENDBSM之间的行,如何有效地做到这一点?

我想先用1标记行,我需要使用以下循环来提取这些行,但是这要花很多时间。

chkSTR = 0
for(i in 1:nrow(rDATA)){

  if(rDATA$Data[i] == "BSM"){
    chkSTR = 1
  }

  if(rDATA$Data[i] == "ENDBSM"){
    chkSTR = 0
  }

  rDATA$BOOL[i] = chkSTR

}

输入数据帧示例

rData = data.frame(

Data = 

c(1,"BSM","a",3,3,"ENDBSM",1,3,1,"BSM","b",3,3,"ENDBSM",1,2,1,"BSM","c",2,3,"ENDBSM",1,2)

)


Output example

rData = data.frame(

Data = 

c("BSM","a",3,3,"ENDBSM","BSM","b",3,3,"ENDBSM","BSM","c",2,3,"ENDBSM")

)

3 个答案:

答案 0 :(得分:4)

正如评论中提到的,"BSM""ENDBSM"的数目是相同的,并且"BSM"总是最先出现的,我们可以使用mapply并在子集的索引之间创建一个序列。

rData[c(mapply(`:`, which(rData$Data == "BSM"), 
                    which(rData$Data == "ENDBSM"))), , drop = FALSE]
#    Data
#2     BSM
#3       a
#4       3
#5       3
#6  ENDBSM
#10    BSM
#11      b
#12      3
#13      3
#14 ENDBSM
#18    BSM
#19      c
#20      2
#21      3
#22 ENDBSM

答案 1 :(得分:1)

我们可以使用map2中的purrr

library(purrr)
map2(which(rData$Data == "BSM"), which(rData$Data == "ENDBSM"), `:`) %>%
     flatten_int %>%
     extract2(rData, ., )

答案 2 :(得分:1)

您可以使用let config = { headers: { Authorization: `Basic aaaa:xxxx` )}` }, ...... }; const response = await fetch(url, config); 在BSM和ENDBSM之间制作一个触发器。不需要BSM和ENDBSM的数目相同,也不需要BSM在前。当BSM出现时,它很容易打开,而ENDBSM出现时,它很简单。

Reduce

如果您想摆脱周围的BSM和ENDBSM,可以执行以下操作:

idx <- Reduce(function(y,x) {(y || x=="BSM") && x!= "ENDBSM"}, x=rData$Data, init=FALSE, accumulate=TRUE)
rData[idx[-1] | idx[-length(idx)], , drop = FALSE]
#     Data
#2     BSM
#3       a
#4       3
#5       3
#6  ENDBSM
#10    BSM
#11      b
#12      3
#13      3
#14 ENDBSM
#18    BSM
#19      c
#20      2
#21      3
#22 ENDBSM