Question

我有一个包含6列的R数据框，以及如何在09：16：00.00和09：30：00之间提取数据？

输入

20140901 09:16:00.00 7994.70 7997.50 7984.00 7996.60
20140901 09:17:00.00 7995.85 7999.10 7992.40 7997.00
20140901 09:18:00.00 7996.70 8005.20 7996.40 8004.00
20140901 09:19:00.00 8004.50 8007.95 8003.95 8004.00
20140901 09:20:00.00 8004.30 8005.70 7998.90 7998.95
20140901 09:21:00.00 7998.55 7998.70 7994.50 7995.35
20141031 09:30:00.00 8218.70 8221.85 8218.65 8221.70
20141031 09:34:00.00 8221.60 8225.10 8221.60 8224.80
20141031 09:35:00.00 8224.75 8228.55 8224.70 8227.65
20141031 09:36:00.00 8227.85 8231.25 8227.50 8230.40
20141031 09:37:00.00 8231.00 8237.30 8230.35 8235.95
20141031 09:38:00.00 8236.25 8241.50 8233.60 8234.40
20141031 09:39:00.00 8234.75 8235.00 8229.00 8229.10

输出

20140901 09:16:00.00 7994.70 7997.50 7984.00 7996.60
20140901 09:17:00.00 7995.85 7999.10 7992.40 7997.00
20140901 09:18:00.00 7996.70 8005.20 7996.40 8004.00
20140901 09:19:00.00 8004.50 8007.95 8003.95 8004.00
20140901 09:20:00.00 8004.30 8005.70 7998.90 7998.95
20140901 09:21:00.00 7998.55 7998.70 7994.50 7995.35
20141031 09:30:00.00 8218.70 8221.85 8218.65 8221.70

Answer 1

这是一个基本的选择程序。实际上不需要dplyr或任何扩展包。您只需指定一个任意的参考日，以便相互比较日期/时间：

times <- as.POSIXct(paste("1970-01-01",dat$V2),tz="UTC")
dat[
  times >= as.POSIXct("1970-01-01 09:16",tz="UTC") &
  times <= as.POSIXct("1970-01-01 09:30",tz="UTC"),
]

#        V1          V2      V3      V4      V5      V6
#1 20140901 09:16:00.00 7994.70 7997.50 7984.00 7996.60
#2 20140901 09:17:00.00 7995.85 7999.10 7992.40 7997.00
#3 20140901 09:18:00.00 7996.70 8005.20 7996.40 8004.00
#4 20140901 09:19:00.00 8004.50 8007.95 8003.95 8004.00
#5 20140901 09:20:00.00 8004.30 8005.70 7998.90 7998.95
#6 20140901 09:21:00.00 7998.55 7998.70 7994.50 7995.35
#7 20141031 09:30:00.00 8218.70 8221.85 8218.65 8221.70

如果你感觉大胆，你可以放弃tz=和指定日期，一次性完成所有操作：

transform(dat, times = as.POSIXct(dat$V2,format="%H:%M:%S"))[
  times >= as.POSIXct("09:16",format="%H:%M") &
  times <= as.POSIXct("09:30",format="%H:%M"),
]

我唯一可以想到后一种情况可能会破坏的边缘情况是，如果它设置在晚上11:59:59运行，并且在之后的as.POSIXct次呼叫之一之前一直打到第二天。当as.POSIXct指定没有日期的时间时，它只会替换当前的系统日期。

Answer 2

dplyr操作的完美候选人：

dat <- read.table(text="20140901 09:16:00.00 7994.70 7997.50 7984.00 7996.60
20140901 09:17:00.00 7995.85 7999.10 7992.40 7997.00
20140901 09:18:00.00 7996.70 8005.20 7996.40 8004.00
20140901 09:19:00.00 8004.50 8007.95 8003.95 8004.00
20140901 09:20:00.00 8004.30 8005.70 7998.90 7998.95
20140901 09:21:00.00 7998.55 7998.70 7994.50 7995.35
20141031 09:30:00.00 8218.70 8221.85 8218.65 8221.70
20141031 09:34:00.00 8221.60 8225.10 8221.60 8224.80
20141031 09:35:00.00 8224.75 8228.55 8224.70 8227.65
20141031 09:36:00.00 8227.85 8231.25 8227.50 8230.40
20141031 09:37:00.00 8231.00 8237.30 8230.35 8235.95
20141031 09:38:00.00 8236.25 8241.50 8233.60 8234.40
20141031 09:39:00.00 8234.75 8235.00 8229.00 8229.10", 
                  header=FALSE, stringsAsFactors=FALSE)


library(dplyr)

dat %>% 
  mutate(timestamp=as.POSIXct(sprintf("%s %s", V1, V2),  
                              format="%Y%m%d %H:%M:%S")) %>%  # real timestamps 
  group_by(V1) %>%                                            # perform operation by day
  filter(timestamp>=as.POSIXct(sprintf("%s 09:16:00.00", V1), 
                               format="%Y%m%d %H:%M:%S")) %>% # >= first HMS
  filter(timestamp<=as.POSIXct(sprintf("%s 09:30:00.00", V1),
                               format="%Y%m%d %H:%M:%S")) %>% # <= last HMS
  ungroup

## Source: local data frame [7 x 7]
## 
##         V1          V2      V3      V4      V5      V6           timestamp
##      (int)       (chr)   (dbl)   (dbl)   (dbl)   (dbl)              (time)
## 1 20140901 09:16:00.00 7994.70 7997.50 7984.00 7996.60 2014-09-01 09:16:00
## 2 20140901 09:17:00.00 7995.85 7999.10 7992.40 7997.00 2014-09-01 09:17:00
## 3 20140901 09:18:00.00 7996.70 8005.20 7996.40 8004.00 2014-09-01 09:18:00
## 4 20140901 09:19:00.00 8004.50 8007.95 8003.95 8004.00 2014-09-01 09:19:00
## 5 20140901 09:20:00.00 8004.30 8005.70 7998.90 7998.95 2014-09-01 09:20:00
## 6 20140901 09:21:00.00 7998.55 7998.70 7994.50 7995.35 2014-09-01 09:21:00
## 7 20141031 09:30:00.00 8218.70 8221.85 8218.65 8221.70 2014-10-31 09:30:00

R数据帧中的时间数据提取

2 个答案: