删除(交易)总交易量低的日子

时间:2021-07-13 09:31:27

标签: r dplyr row volume remove

我有特定日期的每一分钟的逐笔数据集。像这样:

Date       Time  Open High Low Close Volume Tick.Count Time2 Date2      date_time
1997-09-10 00:01 0    0    0   0     0      0          00:01 1997/09/10 1997-09-10 00:01:00 
1997-09-10 00:02 0    0    0   0     0      0          00:02 1997/09/10 1997-09-10 00:02:00 

为方便起见,我只取了其中没有真正价格的第一行。 如果全天的 Volume 低于 100,我想删除完整的交易日。

有人知道怎么做吗?

复制代码(5 行):

df <- structure(list(Date = structure(c(10114, 10114, 10114, 10114, 
                                     10114), class = "Date"), Time = c("00:01", "00:02", "00:03", 
                                                                       "00:04", "00:05"), Open = c(0, 0, 0, 0, 0), High = c(0, 0, 0, 
                                                                                                                            0, 0), Low = c(0, 0, 0, 0, 0), Close = c(0, 0, 0, 0, 0), Volume = c(0L, 
                                                                                                                                                                                                0L, 0L, 0L, 0L), Tick.Count = c(0L, 0L, 0L, 0L, 0L), Time2 = c("00:01", 
                                                                                                                                                                                                                                                               "00:02", "00:03", "00:04", "00:05"), Date2 = c("1997/09/10", 
                                                                                                                                                                                                                                                                                                              "1997/09/10", "1997/09/10", "1997/09/10", "1997/09/10"), date_time = structure(list(
                                                                                                                                                                                                                                                                                                                sec = c(0, 0, 0, 0, 0), min = 1:5, hour = c(0L, 0L, 0L, 0L, 
                                                                                                                                                                                                                                                                                                                                                            0L), mday = c(10L, 10L, 10L, 10L, 10L), mon = c(8L, 8L, 8L, 
                                                                                                                                                                                                                                                                                                                                                                                                            8L, 8L), year = c(97L, 97L, 97L, 97L, 97L), wday = c(3L, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                 3L, 3L, 3L, 3L), yday = c(252L, 252L, 252L, 252L, 252L), 
                                                                                                                                                                                                                                                                                                                isdst = c(1L, 1L, 1L, 1L, 1L), zone = c("CEST", "CEST", "CEST", 
                                                                                                                                                                                                                                                                                                                                                        "CEST", "CEST"), gmtoff = c(NA_integer_, NA_integer_, NA_integer_, 
                                                                                                                                                                                                                                                                                                                                                                                    NA_integer_, NA_integer_)), class = c("POSIXlt", "POSIXt"
                                                                                                                                                                                                                                                                                                                                                                                    ))), row.names = c(NA, 5L), class = "data.frame")

先谢谢你。 亲切的问候, 于尔根

3 个答案:

答案 0 :(得分:4)

您可以使用 ave 为每个 sum 构建 VolumeDate 并比较它是否为 >= 100 并使用它对 {{1} } 使用df

[

答案 1 :(得分:0)

这里有 dplyrdata.table 替代方案 -

#1. dplyr
library(dplyr)
df %>% group_by(Date) %>% filter(sum(Volume, na.rm = TRUE) >= 100) %>% ungroup

#2. data.table
library(data.table)
setDT(df)[, .SD[sum(Volume, na.rm = TRUE) >= 100], Date]

答案 2 :(得分:0)

使用dplyr

library(dplyr)
df %>%
     group_by(Date) %>%
     slice(which(sum(Volume, na.rm = TRUE) >= 100)) %>%
     ungroup