将字符串日期转换为R中的日期

时间:2018-12-14 15:04:35

标签: r

我有一个包含日期的数据框,如String

1    Aug 10, 2018
2    Aug 13, 2018
3    Aug 9,  2018 
4    Jan 23, 2018
5    Aug 31, 2018
6    Jan 29, 2018

如何将其显示为:

1   10/08/2018 #I'm european
2   13/08/2018
3   09/08/2018
4   23/01/2018

使它成为日期数据类型,而不是字符串。

3 个答案:

答案 0 :(得分:1)

请参见此helpful page?strptime

format(as.Date("Aug 10, 2018", format = "%b %d, %y"), "%d/%m/%Y")
## [1] "10/08/2020"

使用@Dirk中的df:

df <- data.frame(v = 1:6, d = c("Aug 10, 2018", "Aug 13, 2018", "Aug 9,  2018", 
                                "Jan 23, 2018", "Aug 31, 2018", "Jan 29, 2018"))
df$newd <- format(as.Date(df$d, format = "%b %d, %y"), "%d/%m/%Y")
#   v            d       newd
# 1 1 Aug 10, 2018 10/08/2020
# 2 2 Aug 13, 2018 13/08/2020
# 3 3 Aug 9,  2018 09/08/2020
# 4 4 Jan 23, 2018 23/01/2020
# 5 5 Aug 31, 2018 31/08/2020
# 6 6 Jan 29, 2018 29/01/2020

答案 1 :(得分:1)

要使其可重现:

R> df <- data.frame(v=1:6, d=c("Aug 10, 2018", "Aug 13, 2018", "Aug 9,  2018", 
+                              "Jan 23, 2018", "Aug 31, 2018", "Jan 29, 2018"))
R> df
  v            d
1 1 Aug 10, 2018
2 2 Aug 13, 2018
3 3 Aug 9,  2018
4 4 Jan 23, 2018
5 5 Aug 31, 2018
6 6 Jan 29, 2018
R> library(anytime)              # parse dates and times without formats
R> df$date <- anydate(df$d)      # finds matching format
R> df
  v            d       date
1 1 Aug 10, 2018 2018-08-10
2 2 Aug 13, 2018 2018-08-13
3 3 Aug 9,  2018       <NA>
4 4 Jan 23, 2018 2018-01-23
5 5 Aug 31, 2018 2018-08-31
6 6 Jan 29, 2018 2018-01-29
R> 

第三行是Boost库中底层解析器的一个已知缺陷-一旦您用两位数字写出日期,即``2018年8月9日'',它就会起作用。

要获得所需的输出,可以使用format()strptime() -或坚持使用 standardized 格式之一。默认情况下,您已经获得ISO8601,在这里我们添加了另一个(使用软件包anytime中的另一个功能):

R> df$fmt <- rfc2822(df$date)
R> df
  v             d       date              fmt
1 1  Aug 10, 2018 2018-08-10 Fri, 10 Aug 2018
2 2  Aug 13, 2018 2018-08-13 Mon, 13 Aug 2018
3 3 Aug 09,  2018 2018-08-09 Thu, 09 Aug 2018
4 4  Jan 23, 2018 2018-01-23 Tue, 23 Jan 2018
5 5  Aug 31, 2018 2018-08-31 Fri, 31 Aug 2018
6 6  Jan 29, 2018 2018-01-29 Mon, 29 Jan 2018
R> 

最后,我建议您反对所需的格式,因为它可能会误导/误解,但出于完整性考虑:

R> df$bad <- format(df$date, "%d/%m/%Y")
R> df
  v             d       date              fmt        bad
1 1  Aug 10, 2018 2018-08-10 Fri, 10 Aug 2018 10/08/2018
2 2  Aug 13, 2018 2018-08-13 Mon, 13 Aug 2018 13/08/2018
3 3 Aug 09,  2018 2018-08-09 Thu, 09 Aug 2018 09/08/2018
4 4  Jan 23, 2018 2018-01-23 Tue, 23 Jan 2018 23/01/2018
5 5  Aug 31, 2018 2018-08-31 Fri, 31 Aug 2018 31/08/2018
6 6  Jan 29, 2018 2018-01-29 Mon, 29 Jan 2018 29/01/2018
R> 

使用/作为分隔符 会使人们认为这是愚蠢的北美m / d / y订单。我建议您至少将%m替换为%b

答案 2 :(得分:1)

您可以在R中使用lubdridate软件包

structure(list(law_id = structure(c(19L, 12L, 22L, 20L, 26L, 4L, 7L, 28L, 10L, 14L, 2L, 18L, 24L, 24L, 17L, 9L, 28L, 7L, 28L, 21L, 23L, 8L, 24L, 15L, 24L, 6L, 9L, 1L, 17L, 4L, 23L, 24L, 4L, 10L, 25L, 13L, 24L, 22L, 9L, 11L, 16L, 8L, 24L, 3L, 9L, 5L, 23L, 27L, 25L, 17L), .Label = c("1604", "1898", "2181", "3428", "4795", "5507", "5953", "6269", "6744", "8368", "8759", "9999", "10265", "11235", "12622", "12833", "13489", "15744", "16595", "20200", "20728", "20731", "22433", "23876", "23926", "24150", "24935", "26241"), class = "factor"), class_section_id = structure(c(2L, 1L, 3L, 5L, 3L, 1L, 2L, 7L, 6L, 5L, 6L, 6L, 1L, 5L, 3L, 2L, 1L, 5L, 1L, 2L, 5L, 6L, 6L, 5L, 6L, 6L, 2L, 6L, 3L, 2L, 2L, 5L, 4L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 2L, 5L, 6L, 3L, 7L, 1L, 3L, 5L, 5L, 7L), .Label = c("1", "2", "3", "6", "7", "8", "9"), class = "factor"), firmsize_numpat = structure(c(1L, 4L, 3L, 1L, 2L, 3L, 5L, 4L, 5L, 1L, 5L, 1L, 5L, 5L, 4L, 5L, 4L, 5L, 4L, 3L, 3L, 3L, 5L, 2L, 5L, 2L, 5L, 1L, 4L, 3L, 3L, 5L, 3L, 5L, 2L, 1L, 5L, 3L, 5L, 2L, 1L, 3L, 5L, 1L, 5L, 4L, 3L, 4L, 2L, 4L), .Label = c("0", "1", "2", "3", "4"), class = "factor")), row.names = c(123895L, 71155L, 152220L, 148739L, 175015L, 24338L, 43379L, 192748L, 60320L, 82138L, 11576L, 118608L, 172718L, 172873L, 98145L, 49021L, 192841L, 43502L, 192770L, 152160L, 163562L, 45490L, 172825L, 92072L, 172765L, 38913L, 49067L, 9823L, 98123L, 24386L, 163580L, 172887L, 24383L, 60235L, 173440L, 73281L, 172708L, 152224L, 49003L, 62174L, 94485L, 45527L, 172775L, 13238L, 49211L, 34276L, 163557L, 181681L, 173435L, 98126L), class = "data.frame")

install.packages("lubridate)

library(lubridate)

假设df $ time是带有日期字符串的列