如何按日期分隔同一组ID,然后在r中按时间排序?

时间:2017-08-07 01:11:52

标签: r sorting grouping

我有一个这样的数据框:

deviceid        date                          speed
325           2016/09/12 07:55:40               50
325           2016/09/12 08:55:40               90
325           2016/09/13 06:55:40               40
325           2016/09/13 09:55:40               90
325           2016/09/13 08:55:40               69
325           2016/09/14 08:55:40               99
5525          2016/09/12 09:55:40               60
5525          2016/09/12 06:55:40               90
5525          2016/09/15 03:55:40               63
4325          2016/09/12 08:55:40               99
4325          2016/09/12 07:55:40               30
4325          2016/09/14 10:55:40               70

我想改变它,如下所示:

deviceid             date                        speed
325_12           2016/09/12 07:55:40               50
325_12           2016/09/12 08:55:40               90
325_13           2016/09/13 06:55:40               90
325_13           2016/09/13 08:55:40               69
325_13           2016/09/13 09:55:40               40
325_14           2016/09/14 08:55:40               99
5525_12          2016/09/12 06:55:40               90
5525_12          2016/09/12 09:55:40               60
5525_15          2016/09/15 03:55:40               63
4325_12          2016/09/12 07:55:40               30
4325_12          2016/09/12 08:55:40               99
4325_14          2016/09/14 10:55:40               70

这样做的主要原因是,之后我想对每个组中的时间进行排序以获得不同的日期。因此,输出应该像上面那样。

3 个答案:

答案 0 :(得分:3)

我们只能使用formatpaste deviceid来提取日期

paste(df$deviceid, format(as.POSIXct(df$date), "%d"), sep = "_")

#[1] "325_12"  "325_12"  "325_13"  "325_13"  "325_13"  "325_14"  "5525_12"
#[8] "5525_12" "5525_15" "4325_12" "4325_12" "4325_14"

答案 1 :(得分:2)

您可以使用pastegsub执行此操作:

df$deviceid = paste(df$deviceid,gsub("\\d+/\\d+/(\\d+).*","\\1",df$date),sep="_")
   deviceid                date speed
1    325_12 2016/09/12 07:55:40    50
2    325_12 2016/09/12 08:55:40    90
3    325_13 2016/09/13 06:55:40    40
4    325_13 2016/09/13 09:55:40    90
5    325_13 2016/09/13 08:55:40    69
6    325_14 2016/09/14 08:55:40    99
7   5525_12 2016/09/12 09:55:40    60
8   5525_12 2016/09/12 06:55:40    90
9   5525_15 2016/09/15 03:55:40    63
10  4325_12 2016/09/12 08:55:40    99
11  4325_12 2016/09/12 07:55:40    30
12  4325_14 2016/09/14 10:55:40    70

答案 2 :(得分:0)

管道编码的相同结果可以帮助您完成工作流程:

library(lubridate)
library(tidyverse)
library(stringr)

df <- data.frame(
          deviceid = c(325, 325, 325, 325, 325, 325, 5525, 5525, 5525, 4325, 4325,
                       4325),
              date = c("2016/09/12 07:55:40", "2016/09/12 08:55:40",
                       "2016/09/13 06:55:40", "2016/09/13 09:55:40",
                       "2016/09/13 08:55:40", "2016/09/14 08:55:40", "2016/09/12 09:55:40",
                       "2016/09/12 06:55:40", "2016/09/15 03:55:40",
                       "2016/09/12 08:55:40", "2016/09/12 07:55:40", "2016/09/14 10:55:40"),
             speed = c(50, 90, 40, 90, 69, 99, 60, 90, 63, 99, 30, 70)
      )


df$date <- ymd_hms(df$date) # convert to date format using lubridate

df %>%
mutate(deviceid = paste(deviceid, str_sub(year(date), 3, 4), sep = "_"))