我有一个数据“movesdata”和列为Start_time,如:
Group Start_time End_time
walking 2016-10-10T12:02:54+02:00 2016-10-10T12:06:18+02:00
walking 2016-10-10T12:06:19+02:00 2016-10-10T12:16:47+02:00
walking 2016-10-10T12:16:55+02:00 2016-10-10T12:17:14+02:00
我想将列Start_time和End_time“2016-10-10T12:02:54 + 02:00”列的值分开到时间12:02:54。我希望删除其他值,但我无法弄清楚如何?问题是3-4行后的日期变化,但GMT(+02:00)的添加是恒定的。我不希望日期(2016-10-10T)和GMT(+02:00)都可以帮助我吗?
答案 0 :(得分:0)
两种方法:第一种方法是实际使用日期转换:
> d <- strptime ("2016-10-10T12:02:54+02:00", "%Y-%m-%dT%H:%M:%S+02:00")
> d
[1] "2016-10-10 12:02:54 EDT"
> format (d, "%H:%M:%S")
[1] "12:02:54"
我最后在这里作了一点作弊,因为strptime
的时区偏移量比+0200
而不是+02:00
,我假设偏移量保持不变
第二种方法是使用你真正应该学习的grep
,但在这种情况下它非常复杂:
> gsub ("^.+T([0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2})\\+.+$", "\\1", "2016-10-10T12:02:54+02:00")
[1] "12:02:54"
(grep
实际上有点难以用于R中的这个(常见)用例,所以我使用gsub
代替。在这个用例中它更容易。)
我建议在这种情况下使用第一个选项,因为它会为错误日期提供错误,而第二个选项很乐意接受25:37:99
之类的时间。您可能认为传入时间都是正确的,但是进行防御性编程总是好的。当然,如果您需要重新格式化日期/时间,最好使用日期/时间功能。
请记住,您可以将其作为矢量化操作:
movesdata$startTime <- format (strptime (movesdata$Start_time, "%Y-%m-%dT%H:%M:%S+02:00"), "%H:%M:%S")
我只是用一个字符串来说明。 (我将目标列命名为不同,因此您可以比较两者。我会尝试保留原始列,直到我知道操作正如我所期望的那样。)
答案 1 :(得分:0)
尝试以下方法:
加载包并制作虚拟数据帧(字符格式的时间/日期)
library(tidyr)
library(dplyr)
library(stringr)
df_char <- data_frame(Group = rep('walking', 3),
Start_time = c('2016-10-10T12:02:54+02:00',
'2016-10-10T12:06:19+02:00',
'2016-10-10T12:16:55+02:00'),
End_time = c('2016-10-10T12:06:18+02:00',
'2016-10-10T12:16:47+02:00',
'2016-10-10T12:17:14+02:00'))
检查数据框
glimpse(df_char)
Observations: 3
Variables: 3
$ Group <chr> "walking", "walking", "walking"
$ Start_time <chr> "2016-10-10T12:02:54+02:00", "2016-10-10T12:06:19+02:00", "2016-1...
$ End_time <chr> "2016-10-10T12:06:18+02:00", "2016-10-10T12:16:47+02:00", "2016-1...
清理数据,保留日期和时区信息,以备日后需要时使用
df_char_clean <- df_char %>%
# Separate Start_time into date and time
separate(col = Start_time,
into = c('Start_date', 'Start_time'),
sep = '[T]') %>%
# Remove '+02:00' timezone
mutate(Start_time = str_extract(string = Start_time,
pattern = '.+(?=[+])')) %>%
# Separate End_time into Date, time, timezone
separate(col = End_time,
into = c('End_date', 'End_time'),
sep = '[T]') %>%
separate(col = End_time,
into = c('End_time', 'tz'),
sep = '[+]')
# If you only want times
# select(df_char_clean,
# Group,
# Start_time,
# End_time)
重新检查数据框
glimpse(df_char_clean)
Observations: 3
Variables: 6
$ Group <chr> "walking", "walking", "walking"
$ Start_date <chr> "2016-10-10", "2016-10-10", "2016-10-10"
$ Start_time <chr> "12:02:54", "12:06:19", "12:16:55"
$ End_date <chr> "2016-10-10", "2016-10-10", "2016-10-10"
$ End_time <chr> "12:06:18", "12:16:47", "12:17:14"
$ tz <chr> "02:00", "02:00", "02:00"