我的数据框中的一个列如下所示:
> head(df$col2,n = 50)
[1] "NA, 2015" "November 13, 2014" "September 27, 2014" "October 8, 2014" "December 16, 2013"
[6] "February 8, 2015" "November 2, 2014" "November 30, 2014" "February 18, 2015" "August 22, 2014"
[11] "October 26, 2014" "January 3, 2014" "May 5, 2015" "February 3, 2014" "October 15, 2014"
[16] "September 12, 2014" "April 2, 2014" "April 23, 2015" "November 4, 2014" "January 16, 2014"
[21] "September 28, 2014" "January 14, 2014" "February 13, 2014" "January 17, 2014" "January 4, 2014"
[26] "February 1, 2015" "January 14, 2014" "April 18, 2014" "October 14, 2014" "August 20, 2014"
[31] "January 20, 2014" "April 11, 2015" "July 5, 2014" "November 29, 2013" "March 22, 2014"
[36] "December 29, 2014" "February 18, 2015" "January 13, 2014" "January 5, 2015" "April 19, 2014"
[41] "November 28, 2014" "13 August, 2014" "14 December, 2014" "10 January, 2014" "3 February, 2014"
[46] "17 March, 2014" "3 July, 2014" "17 October, 2014" "28 January, 2014" "10 October, 2014"
正如您所看到的,除了第一行(我知道是NA,这没有问题)之外,m-d-y和d-m-y之间有两种不同的日期格式。有没有推荐的方法将所有日期标准化为m-d-y?
它们都在我的数据框列中列为字符格式。我试过了
> datestest <- as.Date(df$col2)
,
但我得到
Error in charToDate(x) : character string is not in a standard unambiguous format
结果。
答案 0 :(得分:5)
parse_date_time
中的lubridate
函数允许您使用“orders”参数解析具有异构格式的向量:
require(lubridate)
x <- c("November 2, 2014", "13 August, 2014")
parse_date_time(x, orders = c("mdy", "dmy"))
[1] "2014-11-02 UTC" "2014-08-13 UTC"
答案 1 :(得分:2)
以下是lubridate
的解决方案:
library(lubridate)
x <- c("November 2, 2014", "13 August, 2014" )
它包括用grep
选择显示日期的不同方式(例如,首先是以数字开头的那些,然后使用-
选择其他日期)然后使用不同的相应方式lubridate
的功能。
ind <- grep("^\\d", x)
dmy(x[ind])
[1] "2014-08-13 UTC"
mdy(x[-ind])
[1] "2014-11-02 UTC"
答案 2 :(得分:1)
我似乎记得那里用lubridate
更清晰地完成了这项工作,但我无法回想起它是什么。在过去,我已经用
date_type <- ifelse(grepl(df$col2, "\\w{3,9} \\d{1,2}, \\d{4}"), "mdy",
ifelse(grepl(df$cols, "\\d{1,2} \\w{3,9}, \\d{4}"), "dmy",
NA))
从那里,您可以运行另一个ifelse
来转换日期
date <- ifelse(date_type == "mdy",
as.Date(df$col2, format = "%B %d, %Y"),
as.Date(df$col2, format = "%d %B, %Y"))
这可能会返回一个数字,但您可以使用as.Date(date, origin = "1970-01-01")