我有一个包含日期的数据框,如String
1 Aug 10, 2018
2 Aug 13, 2018
3 Aug 9, 2018
4 Jan 23, 2018
5 Aug 31, 2018
6 Jan 29, 2018
如何将其显示为:
1 10/08/2018 #I'm european
2 13/08/2018
3 09/08/2018
4 23/01/2018
等
使它成为日期数据类型,而不是字符串。
答案 0 :(得分:1)
请参见此helpful page或?strptime
。
format(as.Date("Aug 10, 2018", format = "%b %d, %y"), "%d/%m/%Y")
## [1] "10/08/2020"
使用@Dirk中的df:
df <- data.frame(v = 1:6, d = c("Aug 10, 2018", "Aug 13, 2018", "Aug 9, 2018",
"Jan 23, 2018", "Aug 31, 2018", "Jan 29, 2018"))
df$newd <- format(as.Date(df$d, format = "%b %d, %y"), "%d/%m/%Y")
# v d newd
# 1 1 Aug 10, 2018 10/08/2020
# 2 2 Aug 13, 2018 13/08/2020
# 3 3 Aug 9, 2018 09/08/2020
# 4 4 Jan 23, 2018 23/01/2020
# 5 5 Aug 31, 2018 31/08/2020
# 6 6 Jan 29, 2018 29/01/2020
答案 1 :(得分:1)
要使其可重现:
R> df <- data.frame(v=1:6, d=c("Aug 10, 2018", "Aug 13, 2018", "Aug 9, 2018",
+ "Jan 23, 2018", "Aug 31, 2018", "Jan 29, 2018"))
R> df
v d
1 1 Aug 10, 2018
2 2 Aug 13, 2018
3 3 Aug 9, 2018
4 4 Jan 23, 2018
5 5 Aug 31, 2018
6 6 Jan 29, 2018
R> library(anytime) # parse dates and times without formats
R> df$date <- anydate(df$d) # finds matching format
R> df
v d date
1 1 Aug 10, 2018 2018-08-10
2 2 Aug 13, 2018 2018-08-13
3 3 Aug 9, 2018 <NA>
4 4 Jan 23, 2018 2018-01-23
5 5 Aug 31, 2018 2018-08-31
6 6 Jan 29, 2018 2018-01-29
R>
第三行是Boost库中底层解析器的一个已知缺陷-一旦您用两位数字写出日期,即``2018年8月9日'',它就会起作用。
要获得所需的输出,可以使用format()
或strptime()
-或坚持使用 standardized 格式之一。默认情况下,您已经获得ISO8601,在这里我们添加了另一个(使用软件包anytime
中的另一个功能):
R> df$fmt <- rfc2822(df$date)
R> df
v d date fmt
1 1 Aug 10, 2018 2018-08-10 Fri, 10 Aug 2018
2 2 Aug 13, 2018 2018-08-13 Mon, 13 Aug 2018
3 3 Aug 09, 2018 2018-08-09 Thu, 09 Aug 2018
4 4 Jan 23, 2018 2018-01-23 Tue, 23 Jan 2018
5 5 Aug 31, 2018 2018-08-31 Fri, 31 Aug 2018
6 6 Jan 29, 2018 2018-01-29 Mon, 29 Jan 2018
R>
最后,我建议您反对所需的格式,因为它可能会误导/误解,但出于完整性考虑:
R> df$bad <- format(df$date, "%d/%m/%Y")
R> df
v d date fmt bad
1 1 Aug 10, 2018 2018-08-10 Fri, 10 Aug 2018 10/08/2018
2 2 Aug 13, 2018 2018-08-13 Mon, 13 Aug 2018 13/08/2018
3 3 Aug 09, 2018 2018-08-09 Thu, 09 Aug 2018 09/08/2018
4 4 Jan 23, 2018 2018-01-23 Tue, 23 Jan 2018 23/01/2018
5 5 Aug 31, 2018 2018-08-31 Fri, 31 Aug 2018 31/08/2018
6 6 Jan 29, 2018 2018-01-29 Mon, 29 Jan 2018 29/01/2018
R>
使用/
作为分隔符 会使人们认为这是愚蠢的北美m / d / y订单。我建议您至少将%m
替换为%b
。
答案 2 :(得分:1)
您可以在R中使用lubdridate软件包
structure(list(law_id = structure(c(19L, 12L, 22L, 20L, 26L,
4L, 7L, 28L, 10L, 14L, 2L, 18L, 24L, 24L, 17L, 9L, 28L, 7L, 28L,
21L, 23L, 8L, 24L, 15L, 24L, 6L, 9L, 1L, 17L, 4L, 23L, 24L, 4L,
10L, 25L, 13L, 24L, 22L, 9L, 11L, 16L, 8L, 24L, 3L, 9L, 5L, 23L,
27L, 25L, 17L), .Label = c("1604", "1898", "2181", "3428", "4795",
"5507", "5953", "6269", "6744", "8368", "8759", "9999", "10265",
"11235", "12622", "12833", "13489", "15744", "16595", "20200",
"20728", "20731", "22433", "23876", "23926", "24150", "24935",
"26241"), class = "factor"), class_section_id = structure(c(2L,
1L, 3L, 5L, 3L, 1L, 2L, 7L, 6L, 5L, 6L, 6L, 1L, 5L, 3L, 2L, 1L,
5L, 1L, 2L, 5L, 6L, 6L, 5L, 6L, 6L, 2L, 6L, 3L, 2L, 2L, 5L, 4L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 2L, 5L, 6L, 3L, 7L, 1L, 3L, 5L, 5L,
7L), .Label = c("1", "2", "3", "6", "7", "8", "9"), class = "factor"),
firmsize_numpat = structure(c(1L, 4L, 3L, 1L, 2L, 3L, 5L,
4L, 5L, 1L, 5L, 1L, 5L, 5L, 4L, 5L, 4L, 5L, 4L, 3L, 3L, 3L,
5L, 2L, 5L, 2L, 5L, 1L, 4L, 3L, 3L, 5L, 3L, 5L, 2L, 1L, 5L,
3L, 5L, 2L, 1L, 3L, 5L, 1L, 5L, 4L, 3L, 4L, 2L, 4L), .Label = c("0",
"1", "2", "3", "4"), class = "factor")), row.names = c(123895L,
71155L, 152220L, 148739L, 175015L, 24338L, 43379L, 192748L, 60320L,
82138L, 11576L, 118608L, 172718L, 172873L, 98145L, 49021L, 192841L,
43502L, 192770L, 152160L, 163562L, 45490L, 172825L, 92072L, 172765L,
38913L, 49067L, 9823L, 98123L, 24386L, 163580L, 172887L, 24383L,
60235L, 173440L, 73281L, 172708L, 152224L, 49003L, 62174L, 94485L,
45527L, 172775L, 13238L, 49211L, 34276L, 163557L, 181681L, 173435L,
98126L), class = "data.frame")
install.packages("lubridate)
library(lubridate)
假设df $ time是带有日期字符串的列