按时间子集数据框

时间:2013-02-06 12:46:40

标签: r

我正在处理一个数据帧,我之前将时间和日期整合到一个列中(称为时间戳):

a <-c(1:21)
D <- c("2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14")
Time <- c("18:40:37", "18:40:48", "18:40:58", "18:41:08","18:41:18","18:41:28","18:41:38","18:41:48","18:41:58","18:42:08","18:42:18","18:42:28","18:42:38","18:42:48","18:42:58","18:43:08","18:43:18","18:42:28", "18:44:18", "18:44:28", "18:44:28")
df1 <- data.frame(a, D, Time)
df1 <- within(df1, { timestamp=format(as.POSIXct(paste(D, Time)), "%d/%m/%Y %H:%M:%S") })

如何在特定时间点之后对数据框进行子集以排除值?我在Stackoverflow中找到了一些我认为可以提供帮助的类似问题的代码,但我很难让时间元素工作:

subset(df1, format.Date(timestamp, ""%d/%m/%Y %H:%M:%S"") >"14/12/2012 18:42:00")

非常感谢任何建议。

编辑: 我正在努力让下面详细介绍的代码处理我的真实数据。我的数据帧的前四行的dput()列在本文末尾。我以前使用@Arun推荐的代码行来为我的数据添加时间戳。

gps <- within(gps, { timestamp=format(as.POSIXct(paste(LOCAL.DATE, LOCAL.TIME)), 
+                                       "%d/%m/%Y %H:%M:%S") })

如果我尝试应用代码的第二部分(strptime ...),我会收到错误消息: $<-.data.frame中的错误(*tmp*,“timestamp”,值= list(sec = c(37,:   替换有30208行,数据有4 这种解释当我尝试将代码应用于我的整个数据时,我得到8行多个数字,用逗号分隔。如果你能以任何方式帮助我,我将非常感激。

structure(list(timestamp = c("14/12/2012 18:40:37", "14/12/2012 18:40:48", 
"14/12/2012 18:40:58", "14/12/2012 18:41:08"), LATITUDE = c(54.77769505, 
54.77765729, 54.77768751, 54.7777021), LONGITUDE = c(-1.56627049, 
-1.56639255, -1.56626555, -1.56662523), HEIGHT = c(" 173.911 M", 
" 161.742 M", " 146.905 M", " 138.016 M"), SPEED = c(" 0.465 km/h", 
" 0.728 km/h", " 4.574 km/h", " 17.335 km/h")), .Names = c("timestamp", 
"LATITUDE", "LONGITUDE", "HEIGHT", "SPEED"), row.names = c(NA, 
4L), class = "data.frame")

第二次修改:非常感谢@Arun提供解决方案。我有点困惑,我想如何使用代码,因为我的数据最初是在日期和时间列(LOCAL.DATE和LOCAL.TIME)。所以我使用了原始解决方案中的第一行代码,然后使用了修订后的编辑中的第二行代码。

这是我使用的代码:

gps <- within(gps, { timestamp=format(as.POSIXct(paste(LOCAL.DATE, LOCAL.TIME)), 
                                      "%d/%m/%Y %H:%M:%S") })

gps$timestamp <- strptime(gps$timestamp, "%Y-%m-%d %H:%M:%S")

然而现在我得到一串NAs(和一些-1)。如果我以错误的方式使用代码,请道歉......

第三次修改 为@Arun的困惑道歉。当我尝试两种方式绕过日期列时,我会收到错误。如果我把它保持为yr / m / d,原始数据是如何格式化的,我得到dput():

structure(list(timestamp = c("2012/12/14 18:40:37", "2012/12/14 18:40:48", 
"2012/12/14 18:40:58", "2012/12/14 18:41:08"), LATITUDE = c(54.77769505, 
54.77765729, 54.77768751, 54.7777021), LONGITUDE = c(-1.56627049, 
-1.56639255, -1.56626555, -1.56662523), HEIGHT = c(" 173.911 M", 
" 161.742 M", " 146.905 M", " 138.016 M"), SPEED = c(" 0.465 km/h", 
" 0.728 km/h", " 4.574 km/h", " 17.335 km/h")), .Names = c("timestamp", 
"LATITUDE", "LONGITUDE", "HEIGHT", "SPEED"), row.names = c(NA, 
4L), class = "data.frame")

如果我然后使用:

gps2$timestamp <- strptime(gps2$timestamp, "%Y/%m/%d %H:%M:%S")

...并尝试在R Studio的工作区窗口中查看数据框,R会话中止。

1 个答案:

答案 0 :(得分:3)

最好是加载字符向量,而不是使用stringsAsFactors = FALSE(如下所示)

# make sure character columns are not converted to factors
df1 <- data.frame(a, D, Time, stringsAsFactors = FALSE)

然后,

df1 <- within(df1, { timestamp=format(as.POSIXct(paste(D, Time)), 
                               "%d/%m/%Y %H:%M:%S") })
# convert timestamp here
df1$timestamp <- strptime(df1$timestamp, "%d/%m/%Y %H:%M:%S")

现在,以这种方式尝试子集:

# now subset
subset(df1, timestamp > strptime("14/12/2012 18:42:00", "%d/%m/%Y %H:%M:%S"))


#     a          D     Time           timestamp
# 10 10 2012/12/14 18:42:08 2012-12-14 18:42:08
# 11 11 2012/12/14 18:42:18 2012-12-14 18:42:18
# 12 12 2012/12/14 18:42:28 2012-12-14 18:42:28
# 13 13 2012/12/14 18:42:38 2012-12-14 18:42:38
# 14 14 2012/12/14 18:42:48 2012-12-14 18:42:48
# 15 15 2012/12/14 18:42:58 2012-12-14 18:42:58
# 16 16 2012/12/14 18:43:08 2012-12-14 18:43:08
# 17 17 2012/12/14 18:43:18 2012-12-14 18:43:18
# 18 18 2012/12/14 18:42:28 2012-12-14 18:42:28
# 19 19 2012/12/14 18:44:18 2012-12-14 18:44:18
# 20 20 2012/12/14 18:44:28 2012-12-14 18:44:28
# 21 21 2012/12/14 18:44:28 2012-12-14 18:44:28

修改:试试这个:

df1 <- within(df1, { timestamp=as.POSIXct(timestamp, format = "%d/%m/%Y %H:%M:%S") })
df1$timestamp <- strptime(df1$timestamp, "%Y-%m-%d %H:%M:%S")