我有一个包含3列的数据框。 (https://pastebin.com/DFqUuuDp)
前两列(“Time1”,“Time2”)包含日期时间数据,并且都具有posixct格式:"%Y-%m-%d %H:%M:%S"
。
所以我最终需要的是一个行的子选择,其中对于Time1中的特定时间,只选择在Time2
中的行一个正确的例子:
+---------------------+----------------------+
| Time1 | Time2 |
+---------------------+----------------------+
| 2016-11-01 00:00:00 | 2016-10-31 00:00:00 |
+---------------------+----------------------+
一个错误的例子:
+---------------------+----------------------+
| Time1 | Time2 |
+---------------------+----------------------+
| 2016-11-01 00:00:00 | 2016-10-31 12:00:00 |
+---------------------+----------------------+
在上传的文件中,我手动添加了第三列(“值”)作为我想在最后过滤的行的指导。带有“True”的行很有意思。
我用两个for循环解决了它,但是通过大型表格会非常慢。
答案 0 :(得分:2)
以下解决方案有效。该解决方案使用OP
library(dplyr)
library(lubridate)
df %>%
filter((as.Date(Time2)+days(1)) == as.Date(Time1) & format(Time2, "%H") < 12)
# Time1 Time2 Value
# 1 2016-11-01 00:00:00 2016-10-31 TRUE
# 2 2016-11-01 00:30:00 2016-10-31 TRUE
# 3 2016-11-01 01:00:00 2016-10-31 TRUE
# 4 2016-11-01 01:30:00 2016-10-31 TRUE
# 5 2016-11-01 02:00:00 2016-10-31 TRUE
# 6 2016-11-01 02:30:00 2016-10-31 TRUE
# 7 2016-11-01 03:00:00 2016-10-31 TRUE
# 8 2016-11-01 03:30:00 2016-10-31 TRUE
# 9 2016-11-01 04:00:00 2016-10-31 TRUE
# 10 2016-11-01 04:30:00 2016-10-31 TRUE
# so on
答案 1 :(得分:2)
该问题已被标记为data.table
。此外,OP提供的样本数据集属于data.table
类。因此,我觉得有必要发布一个data.table
解决方案:
library(data.table)
DT[as.IDate(Time1) - 1L == as.IDate(Time2) & hour(Time2) < 12]
Time1 Time2 Value 1: 2016-11-01 00:00:00 2016-10-31 TRUE 2: 2016-11-01 00:30:00 2016-10-31 TRUE 3: 2016-11-01 01:00:00 2016-10-31 TRUE 4: 2016-11-01 01:30:00 2016-10-31 TRUE
# check if result is correct
identical(DT[as.IDate(Time1) - 1L == as.IDate(Time2) & hour(Time2) < 12L],
DT[Value == "TRUE"])
[1] TRUE
as.IDate()
是一个带有整数存储的Date类,用于快速排序和分组。因此,我们可以使用整数运算来计算前一天。 hour()
包也提供data.table
,并将当天的小时数作为整数值返回。
从2018-05-29在22:00 UTC从pastebin link provided by the OP复制但删除了.internal.selfref
指针的数据:
DT <- structure(
list(
Time1 = structure(c(1477958400, 1477958400, 1477958400,
1477958400, 1477958400, 1477958400, 1477958400, 1477960200, 1477960200,
1477960200, 1477960200, 1477960200, 1477960200, 1477960200, 1477962000,
1477962000, 1477962000, 1477962000, 1477962000, 1477962000, 1477962000,
1477963800, 1477963800, 1477963800, 1477963800, 1477963800, 1477963800,
1477963800),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Time2 = structure(c(1477699200, 1477742400, 1477785600, 1477828800,
1477872000, 1477915200, 1477958400, 1477699200, 1477742400,
1477785600, 1477828800, 1477872000, 1477915200, 1477958400,
1477699200, 1477742400, 1477785600, 1477828800, 1477872000,
1477915200, 1477958400, 1477699200, 1477742400, 1477785600,
1477828800, 1477872000, 1477915200, 1477958400),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Value = c("FALSE", "FALSE", "FALSE",
"FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE",
"FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE",
"FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE",
"FALSE", "TRUE", "FALSE", "FALSE")),
.Names = c("Time1", "Time2", "Value"),
row.names = c(NA, -28L),
class = c("data.table", "data.frame"))
答案 2 :(得分:1)
使用subset
和lubridate
包的解决方案可以使用以下方法:
Time2
添加1天,然后检查Time1
&amp; Time2
属于同一天。Time2
格式格式化HHMMSS
,然后检查它是否小于120000
(中午)代码:
library(lubridate)
subset(df, format(Time1,"%Y%m%d") == format(Time2+days(1),"%Y%m%d") &
as.integer(format(Time2, "%H%M%S")) < 120000 )
# Time1 Time2 Value
# 19 2016-11-01 00:00:00 2016-10-31 TRUE
# 39 2016-11-01 00:30:00 2016-10-31 TRUE
# 59 2016-11-01 01:00:00 2016-10-31 TRUE
# 79 2016-11-01 01:30:00 2016-10-31 TRUE
# 99 2016-11-01 02:00:00 2016-10-31 TRUE
# 119 2016-11-01 02:30:00 2016-10-31 TRUE
# 139 2016-11-01 03:00:00 2016-10-31 TRUE
# 159 2016-11-01 03:30:00 2016-10-31 TRUE
# 179 2016-11-01 04:00:00 2016-10-31 TRUE
#
# so on
注意: Time2
所有行作为子集的一部分包含00:00:00
。因此它没有出现在上面的印刷品中。
数据:
head(df, 20)
# Time1 Time2 Value
# 1 2016-11-01 2016-10-22 00:00:00 FALSE
# 2 2016-11-01 2016-10-22 12:00:00 FALSE
# 3 2016-11-01 2016-10-23 00:00:00 FALSE
# 4 2016-11-01 2016-10-23 12:00:00 FALSE
# 5 2016-11-01 2016-10-24 00:00:00 FALSE
# 6 2016-11-01 2016-10-24 12:00:00 FALSE
# 7 2016-11-01 2016-10-25 00:00:00 FALSE
# 8 2016-11-01 2016-10-25 12:00:00 FALSE
# 9 2016-11-01 2016-10-26 00:00:00 FALSE
# 10 2016-11-01 2016-10-26 12:00:00 FALSE
# 11 2016-11-01 2016-10-27 00:00:00 FALSE
# 12 2016-11-01 2016-10-27 12:00:00 FALSE
# 13 2016-11-01 2016-10-28 00:00:00 FALSE
# 14 2016-11-01 2016-10-28 12:00:00 FALSE
# 15 2016-11-01 2016-10-29 00:00:00 FALSE
# 16 2016-11-01 2016-10-29 12:00:00 FALSE
# 17 2016-11-01 2016-10-30 00:00:00 FALSE
# 18 2016-11-01 2016-10-30 12:00:00 FALSE
# 19 2016-11-01 2016-10-31 00:00:00 TRUE
# 20 2016-11-01 2016-10-31 12:00:00 FALSE