data.table:删除那些匹配指定条件的行?

时间:2016-01-06 05:51:01

标签: r data.table

我试图删除卷等于0的行,以及这些行正下方的行。所以,对于下面的df,我想删除第3,4行:

head(data1)
           open high  low close volume adj.
2013-12-23 6.32 6.36 6.21  6.22 329400 6.22
2013-12-24 6.27 6.36 6.22  6.30 126500 6.30
2013-12-25 6.30 6.30 6.30  6.30      0 6.30
2013-12-26 6.30 6.36 6.23  6.23 126600 6.23
2013-12-27 6.26 6.28 6.20  6.24  54000 6.24
2013-12-30 6.24 6.50 6.24  6.44  61000 6.44

我有一个有效的解决方案,但是令人尴尬的漫长而草率:

if.zero.or.not <- as.data.frame(data1$volume == 0)
combined.data = bind_cols(data1, if.zero.or.not )
colnames(combined.data) = c('open', 'high', 'low', 'close', 'volume', 'adj.', 'ifzero')
combined.data.shifted = transform(combined.data,  ifzero = lag(ifzero))
zeros.and.trues.removed = subset(trues.removed, volume != 0, ifzero != T)

我怎么能用一两行呢?

2 个答案:

答案 0 :(得分:3)

我会写data.table因为我更喜欢语法; base的翻译很简单。

library(xts)  #Needed to get the following "xts" "zoo" object
data1 <- structure(c(6.32, 6.27, 6.3, 6.3, 6.26, 6.24, 6.36, 6.36, 6.3, 
6.36, 6.28, 6.5, 6.21, 6.22, 6.3, 6.23, 6.2, 6.24, 6.22, 6.3, 
6.3, 6.23, 6.24, 6.44, 329400, 126500, 0, 126600, 54000, 61000, 
6.22, 6.3, 6.3, 6.23, 6.24, 6.44), .Dim = c(6L, 6L), .Dimnames = list(
    NULL, c("open", "high", "low", "close", "volume", "adj.")), index = structure(c(1387756800, 
1387843200, 1387929600, 1388016000, 1388102400, 1388361600), tzone = "UTC", tclass = "Date"), .indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", class = c("xts", 
"zoo"))

library(data.table)
#setDT fails on "xts" "zoo" object. We need as.data.table
#setDT(data1) #convert to native 'data.table' class _by reference_

data1 <- as.data.table(data1)
data1[if (!length(rows <- -c(idx <- which(volume == 0), (if (volume[.N] == 0) idx[-length(idx)] else idx) + 1L))) TRUE else rows]

如果你的表非常庞大并且有很多聚类零,那么在c(...)中包裹unique应该更有效率。

如果你有结构性的理由知道最后一行不会为零,那么这个版本的眼睛就更容易了:

data1[if (!length(rows <- -c(idx <- which(volume == 0), idx + 1L))) TRUE else rows]

答案 1 :(得分:0)

这可能会有所帮助。这是一个示例数据示例。

a  <- c(1,0,9,7,5,0,7,0)
b  <- c(1,9,6,7,4,5,7,8)
dc < -data.frame(a,b)
dc_removed_zero_and_the_next_row <- dc[-c(which(dc$a==0),which(dc$a==0)+1),]