转换日期时间字符串并按日期排序

时间:2018-07-27 21:44:03

标签: r sorting date posixct

我遇到了一个不小的问题,即经常遇到的困难是,将R中的表示日期时间的字符串放入R理解为这样的对象中(POSIXct?)。

我有一个日期时间的字符向量,看起来像这样:

 [1] "Thu Apr 19 00:42:24 +0000 2018" "Sat Apr 14 03:08:30 +0000 2018" "Thu Apr 02 12:42:07 +0000 2015"
 [4] "Wed Apr 25 02:24:49 +0000 2018" "Sun Apr 03 00:37:19 +0000 2016" "Fri Apr 11 10:02:42 +0000 2014"
 [7] "Tue Jan 09 13:57:33 +0000 2018" "Wed Apr 13 09:45:05 +0000 2016" "Thu May 18 11:26:10 +0000 2017"
[10] "Thu Oct 05 03:41:32 +0000 2017"

我的目标是对这些值进行排序,以使最新日期位于顶部,而最早的日期位于底部。据我所知,这将涉及将这些字符串转换为日期时间对象,但是即使是这一步,我也没有工作。

我尝试过:

lubridate::as_date(dates[1], tz = "UTC", format = NULL)
as.POSIXct(dates[1], tz = "UTC")

但是我总是收到以下错误:

Error in as.POSIXlt.character(x, tz, ...) : 
character string is not in a standard unambiguous format

我认为可以通过指定format参数来解决此问题,但是我该怎么做呢? 此外,一旦我转换了它们(或者,如果我不需要进行转换,就不用这样做)-那么如何对这些日期进行排序?

任何帮助将不胜感激, 提前致谢!

2 个答案:

答案 0 :(得分:3)

或者我们可以使用order(as.Date())

> dt[order(as.Date(dt, format="%a %b %d %H:%M:%S %z %Y"))]
 [1] "Fri Apr 11 10:02:42 +0000 2014" "Thu Apr 02 12:42:07 +0000 2015" "Sun Apr 03 00:37:19 +0000 2016"
 [4] "Wed Apr 13 09:45:05 +0000 2016" "Thu May 18 11:26:10 +0000 2017" "Thu Oct 05 03:41:32 +0000 2017"
 [7] "Tue Jan 09 13:57:33 +0000 2018" "Sat Apr 14 03:08:30 +0000 2018" "Thu Apr 19 00:42:24 +0000 2018"
[10] "Wed Apr 25 02:24:49 +0000 2018"

数据

dt <- c("Thu Apr 19 00:42:24 +0000 2018", "Sat Apr 14 03:08:30 +0000 2018" ,
        "Thu Apr 02 12:42:07 +0000 2015", "Wed Apr 25 02:24:49 +0000 2018", 
        "Sun Apr 03 00:37:19 +0000 2016", "Fri Apr 11 10:02:42 +0000 2014",
        "Tue Jan 09 13:57:33 +0000 2018" ,"Wed Apr 13 09:45:05 +0000 2016" ,
        "Thu May 18 11:26:10 +0000 2017","Thu Oct 05 03:41:32 +0000 2017")

答案 1 :(得分:0)

这是一种删除多余的+0000并用正则表达式将年份移动到与月份和日期相邻的位置的一种方法,然后使用lubridate的解析器来获取所需的输出。如果您更喜欢正则表达式而不是记住strptime代码,那么...

library(stringr)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
dates <- c(
  "Thu Apr 19 00:42:24 +0000 2018", "Sat Apr 14 03:08:30 +0000 2018",
  "Thu Apr 02 12:42:07 +0000 2015", "Wed Apr 25 02:24:49 +0000 2018",
  "Sun Apr 03 00:37:19 +0000 2016", "Fri Apr 11 10:02:42 +0000 2014",
  "Tue Jan 09 13:57:33 +0000 2018", "Wed Apr 13 09:45:05 +0000 2016",
  "Thu May 18 11:26:10 +0000 2017", "Thu Oct 05 03:41:32 +0000 2017"
)

dates %>%
  str_replace_all("(^.{4})(.{6} )(.{8})( \\+0000 )(\\d{4})$", "\\2\\5 \\3") %>%
  mdy_hms()
#>  [1] "2018-04-19 00:42:24 UTC" "2018-04-14 03:08:30 UTC"
#>  [3] "2015-04-02 12:42:07 UTC" "2018-04-25 02:24:49 UTC"
#>  [5] "2016-04-03 00:37:19 UTC" "2014-04-11 10:02:42 UTC"
#>  [7] "2018-01-09 13:57:33 UTC" "2016-04-13 09:45:05 UTC"
#>  [9] "2017-05-18 11:26:10 UTC" "2017-10-05 03:41:32 UTC"

reprex package(v0.2.0)于2018-07-27创建。

相关问题