我有一个数据框,其中日期字段为因子,并且具有如下值的混合。如何标准化,转换为日期格式并提取月份和年份?
1 Oct 24 2013 3:59PM
2 Nov 5 2013 3:00PM
3 Nov 26 2013 1:00PM
4 2015-05-05 21:09:00
5 Nov 19 2013 1:00PM
6 2015-05-28 20:23:00
7 2015-05-28 20:24:00
8 Nov 12 2013 1:00PM
9 2015-05-28 20:29:00
10 2015-05-28 20:26:00
答案 0 :(得分:1)
您可以尝试 lubridate 包中的parse_date_time()
。我发现它可以更轻松地处理多种格式。这只是摆弄orders
论点的问题。在这里,我们可以使用c("mdyR", "ymdT")
作为orders
向量。
library(lubridate)
parse_date_time(df$V1, c("mdyR", "ymdT"))
# [1] "2013-10-24 15:59:00 UTC" "2013-11-05 15:00:00 UTC"
# [3] "2013-11-26 13:00:00 UTC" "2015-05-05 21:09:00 UTC"
# [5] "2013-11-19 13:00:00 UTC" "2015-05-28 20:23:00 UTC"
# [7] "2015-05-28 20:24:00 UTC" "2013-11-12 13:00:00 UTC"
# [9] "2015-05-28 20:29:00 UTC" "2015-05-28 20:26:00 UTC"
要提取月份和年份,我们可以执行以下操作。
pdt <- parse_date_time(df$V1, c("mdyR", "ymdT"))
month(pdt)
# [1] 10 11 11 5 11 5 5 11 5 5
year(pdt)
# [1] 2013 2013 2013 2015 2013 2015 2015 2013 2015 2015
数据:强>
df <- structure(list(V1 = structure(c(10L, 9L, 8L, 1L, 7L, 2L, 3L,
6L, 5L, 4L), .Label = c("2015-05-05 21:09:00", "2015-05-28 20:23:00",
"2015-05-28 20:24:00", "2015-05-28 20:26:00", "2015-05-28 20:29:00",
"Nov 12 2013 1:00PM", "Nov 19 2013 1:00PM", "Nov 26 2013 1:00PM",
"Nov 5 2013 3:00PM", "Oct 24 2013 3:59PM"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA,
-10L))
答案 1 :(得分:1)
查看是否可以使用一种格式(在这种情况下为as.POSIXct.factor
的默认值)解析数据,然后在不成功时尝试另一种格式:
dats$dt2 <- as.POSIXct( # Needed b/c get numeric values from the `if(){}`; Why?
sapply(trim(dats$dt), # sjmisc:trim() only needed if have extra spaces
function(d) if( !is.na( strptime(d, "%Y-%m-%d %H:%M:%S") ) ){
as.POSIXct(d) } else {
as.POSIXct( d, format="%b %d %Y %H:%M%p") }), origin="1970-01-01" )