如何删除R

时间:2017-01-06 14:59:12

标签: r

我有一个数据“movesdata”和列为Start_time,如:

Group   Start_time                      End_time
walking 2016-10-10T12:02:54+02:00   2016-10-10T12:06:18+02:00
walking 2016-10-10T12:06:19+02:00   2016-10-10T12:16:47+02:00
walking 2016-10-10T12:16:55+02:00   2016-10-10T12:17:14+02:00

我想将列Start_time和End_time“2016-10-10T12:02:54 + 02:00”列的值分开到时间12:02:54。我希望删除其他值,但我无法弄清楚如何?问题是3-4行后的日期变化,但GMT(+02:00)的添加是恒定的。我不希望日期(2016-10-10T)和GMT(+02:00)都可以帮助我吗?

2 个答案:

答案 0 :(得分:0)

两种方法:第一种方法是实际使用日期转换:

> d <- strptime ("2016-10-10T12:02:54+02:00", "%Y-%m-%dT%H:%M:%S+02:00")
> d
[1] "2016-10-10 12:02:54 EDT"
> format (d, "%H:%M:%S")
[1] "12:02:54"

我最后在这里作了一点作弊,因为strptime的时区偏移量比+0200而不是+02:00,我假设偏移量保持不变

第二种方法是使用你真正应该学习的grep,但在这种情况下它非常复杂:

> gsub ("^.+T([0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2})\\+.+$",  "\\1", "2016-10-10T12:02:54+02:00")
[1] "12:02:54"

grep实际上有点难以用于R中的这个(常见)用例,所以我使用gsub代替。在这个用例中它更容易。)

我建议在这种情况下使用第一个选项,因为它会为错误日期提供错误,而第二个选项很乐意接受25:37:99之类的时间。您可能认为传入时间都是正确的,但是进行防御性编程总是好的。当然,如果您需要重新格式化日期/时间,最好使用日期/时间功能。

请记住,您可以将其作为矢量化操作:

movesdata$startTime <- format (strptime (movesdata$Start_time, "%Y-%m-%dT%H:%M:%S+02:00"), "%H:%M:%S")

我只是用一个字符串来说明。 (我将目标列命名为不同,因此您可以比较两者。我会尝试保留原始列,直到我知道操作正如我所期望的那样。)

答案 1 :(得分:0)

尝试以下方法:

加载包并制作虚拟数据帧(字符格式的时间/日期)

library(tidyr)
library(dplyr)
library(stringr)

df_char <- data_frame(Group = rep('walking', 3),
                      Start_time = c('2016-10-10T12:02:54+02:00', 
                                     '2016-10-10T12:06:19+02:00', 
                                     '2016-10-10T12:16:55+02:00'),
                      End_time = c('2016-10-10T12:06:18+02:00',
                                   '2016-10-10T12:16:47+02:00',
                                   '2016-10-10T12:17:14+02:00'))

检查数据框

glimpse(df_char)
    Observations: 3
    Variables: 3
    $ Group      <chr> "walking", "walking", "walking"
    $ Start_time <chr> "2016-10-10T12:02:54+02:00", "2016-10-10T12:06:19+02:00", "2016-1...
    $ End_time   <chr> "2016-10-10T12:06:18+02:00", "2016-10-10T12:16:47+02:00", "2016-1...

清理数据,保留日期和时区信息,以备日后需要时使用

df_char_clean <- df_char %>%
     # Separate Start_time into date and time
     separate(col = Start_time, 
              into = c('Start_date', 'Start_time'),
              sep = '[T]') %>%
     # Remove '+02:00' timezone
     mutate(Start_time = str_extract(string = Start_time,
                                     pattern = '.+(?=[+])')) %>%
     # Separate End_time into Date, time, timezone
     separate(col = End_time, 
              into = c('End_date', 'End_time'),
              sep = '[T]') %>%
     separate(col = End_time,
              into = c('End_time', 'tz'),
              sep = '[+]')

# If you only want times
# select(df_char_clean, 
#        Group, 
#        Start_time, 
#        End_time)

重新检查数据框

glimpse(df_char_clean)
    Observations: 3
    Variables: 6
    $ Group      <chr> "walking", "walking", "walking"
    $ Start_date <chr> "2016-10-10", "2016-10-10", "2016-10-10"
    $ Start_time <chr> "12:02:54", "12:06:19", "12:16:55"
    $ End_date   <chr> "2016-10-10", "2016-10-10", "2016-10-10"
    $ End_time   <chr> "12:06:18", "12:16:47", "12:17:14"
    $ tz         <chr> "02:00", "02:00", "02:00"