使用条件

时间:2016-03-23 23:13:59

标签: r

我有一个数据框

mydata = data.table(MyTimes = as.POSIXct(c("2015-01-01 00:00:03","2015-01-01 00:00:04","2015-01-01 00:00:18","2015-01-01 00:00:48","2015-01-01 00:00:48","2015-01-01 00:00:54","2015-01-01 00:01:12","2015-01-01 00:01:45"),tz = "GMT"),othercol= c(1,2,3,4,5,6,7))


 mydata
               MyTimes othercol
1: 2015-01-01 00:00:03        1
2: 2015-01-01 00:00:04        2
3: 2015-01-01 00:00:18        3
4: 2015-01-01 00:00:48        4
5: 2015-01-01 00:00:48        5
6: 2015-01-01 00:00:54        6
7: 2015-01-01 00:01:12        7
8: 2015-01-01 00:01:45        1

数据按时间排序,我想将这个数据帧分成2个数据帧,有两个条件:

  1. 如果可能,中断应该在中间
  2. 但在相同SECOND的休息时间附近的时间必须在同一数据帧中
  3. 所以在这个例子中有8行,我想在中间打破它。每个4行但是通知00:00:48将在两个数据帧中,并且基于上面的点#2是不可能的。这意味着当你休息时,你不能打破相同的秒。

    所以这里的数据框可能是

    data frame 1:
                       MyTimes othercol
         2015-01-01 00:00:03        1
         2015-01-01 00:00:04        2
         2015-01-01 00:00:18        3
         2015-01-01 00:00:48        4
         2015-01-01 00:00:48        5
    
    data frame 2:
         2015-01-01 00:00:54        6
         2015-01-01 00:01:12        7
         2015-01-01 00:01:45        1
    

    或者它可以是这样的:

    data frame1:
       2015-01-01 00:00:03        1
       2015-01-01 00:00:04        2
       2015-01-01 00:00:18        3
    
    data frame2:
        2015-01-01 00:00:48        4
        2015-01-01 00:00:48        5
        2015-01-01 00:00:54        6
        2015-01-01 00:01:12        7
        2015-01-01 00:01:45        1
    

    无论哪种方式,00:00:48都在同一个数据框中

2 个答案:

答案 0 :(得分:1)

这个怎么样?

split(mydata, as.numeric(mydata$MyTimes) < median(as.numeric(mydata$MyTimes)))
$`FALSE`
               MyTimes secondcol
1: 2015-01-01 00:00:48         4
2: 2015-01-01 00:00:48         5
3: 2015-01-01 00:00:54         6
4: 2015-01-01 00:01:12         7
5: 2015-01-01 00:01:45         8

$`TRUE`
               MyTimes secondcol
1: 2015-01-01 00:00:03         1
2: 2015-01-01 00:00:04         2
3: 2015-01-01 00:00:18         3

答案 1 :(得分:0)

不像@DatamineR的解决方案那么优雅,但使用游程编码的替代方案是

library(data.table)

mydata[, grp := rleid(MyTimes)]  ## put times into groups
split(mydata, mydata$grp >= ceiling(max(mydata$grp)/2))

$`FALSE`
               MyTimes othercol grp
1: 2015-01-01 00:00:03        1   1
2: 2015-01-01 00:00:04        2   2
3: 2015-01-01 00:00:18        3   3

$`TRUE`
               MyTimes othercol grp
1: 2015-01-01 00:00:48        4   4
2: 2015-01-01 00:00:48        5   4
3: 2015-01-01 00:00:54        6   5
4: 2015-01-01 00:01:12        7   6
5: 2015-01-01 00:01:45        8   7