Question

如果我有数据框，并且希望根据时间范围选择训练集

df <- data.frame(timestamp = seq(as.POSIXct('2013-08-02 12:00:00'),
                                as.POSIXct('2013-08-06 05:00:00'), len =(45), 
                   x = sample(1:100, 45), y = sample(200:500, 45)))

我现在将时间戳转换为row.names

row.names(df) = df$timestamp

由于我已为row.names编制索引，因此我应该能够为训练集选择一个范围：

 # Select the range 
 s = '2013-08-02 12:00:00'
 e = '2013-08-03 10:15:00'

 # Select the training dataset 

 training = df[s:e,]

但是当我运行上面的代码时，出现以下错误：

 #Error in s:e : NA/NaN argument
 #In addition: Warning messages:
 #1: In `[.data.frame`(df, s:e, ) : NAs introduced by coercion
 #2: In `[.data.frame`(df, s:e, ) : NAs introduced by coercion

有人可以在这里解释我做错了吗！

我知道ts或其他软件包可以解决此问题，但是我没有可以使用的基本R函数。

我在发布问题之前已查看了答案。

Select rows within a particular time range

Answer 1

:不会为您提供要选择的行范围。您需要找出相应的索引，然后在它们之间创建一个序列，然后创建子集

df[which(row.names(df) == s) : which(row.names(df) == e), , drop = FALSE]

#                              timestamp
#2013-08-02 12:00:00 2013-08-02 12:00:00
#2013-08-02 14:01:21 2013-08-02 14:01:21
#2013-08-02 16:02:43 2013-08-02 16:02:43
#2013-08-02 18:04:05 2013-08-02 18:04:05
#2013-08-02 20:05:27 2013-08-02 20:05:27
#2013-08-02 22:06:49 2013-08-02 22:06:49
#2013-08-03 00:08:10 2013-08-03 00:08:10
#2013-08-03 02:09:32 2013-08-03 02:09:32
#2013-08-03 04:10:54 2013-08-03 04:10:54
#2013-08-03 06:12:16 2013-08-03 06:12:16
#2013-08-03 08:13:38 2013-08-03 08:13:38
#2013-08-03 10:15:00 2013-08-03 10:15:00

如果s和e可能有多个值，则在这种情况下最好使用which.max，因为which.max返回第一个最大值的索引。

此外，您根本不需要转换为rownames。您可以使用timestamp列本身来实现相同目的。

df[which.max(df$timestamp == s) : which.max(df$timestamp == e), , drop = FALSE]

Answer 2

这是一条简单的索引指令。

inx <- as.POSIXct(s) <= row.names(df) & row.names(df) <= as.POSIXct(e)
df[inx, ]

为清楚起见，我将其保留为此类，您可以将它做成单线。

根据行名称从时间戳中选择数据范围

2 个答案: