我想将所有混合日期格式更改为一种格式,例如d-m-y
这是数据框
x <- data.frame("Name" = c("A","B","C","D","E"), "Birthdate" = c("36085.0","2001-sep-12","Feb-18-2005","05/27/84", "2020-6-25"))
我在这里尝试使用此代码,但它给出了NAs
newdateformat <- as.Date(x$Birthdate,
format = "%m%d%y", origin = "2020-6-25")
newdateformat
然后我尝试使用解析,但是它也给出了NA,这意味着解析失败
require(lubridate)
parse_date_time(my_data$Birthdate, orders = c("ymd", "mdy"))
[1]不适用不适用“ 2001-09-12 UTC”不适用
[5]“ 2005-02-18 UTC”
,我也可以找到数据框中第一个日期的格式是“ 36085.0” 我确实找到了这段代码,但仍然不明白数字的含义以及“原产地”的含义
dates <- c(30829, 38540)
betterDates <- as.Date(dates,
origin = "1899-12-30")
p / s:我对R很陌生,因此,如果您能使用更简单的解释,谢谢您,谢谢您
答案 0 :(得分:0)
您应该分别解析每种格式。对于每种格式,请使用正则表达式选择相关行,并仅转换这些行,然后继续使用下一种格式。我会用data.table而不是data.frame给出答案,因为我忘记了如何使用data.frame。
library(lubridate)
library(data.table)
x = data.table("Name" = c("A","B","C","D","E"),
"Birthdate" = c("36085.0","2001-sep-12","Feb-18-2005","05/27/84", "2020-6-25"))
# or use setDT(x) to convert an existing data.frame to a data.table
# handle dates like "2001-sep-12" and "2020-6-25"
# this regex matches strings beginning with four numbers and then a dash
x[grepl('^[0-9]{4}-',Birthdate),Birthdate1:=ymd(Birthdate)]
# handle dates like "36085.0": days since 1904 (or 1900)
# see https://docs.microsoft.com/en-us/office/troubleshoot/excel/1900-and-1904-date-system
# this regex matches strings that only have numeric characters and .
x[grepl('^[0-9\\.]+$',Birthdate),Birthdate1:=as.Date(as.numeric(Birthdate),origin='1904-01-01')]
# assume the rest are like "Feb-18-2005" and "05/27/84" and handle those
x[is.na(Birthdate1),Birthdate1:=mdy(Birthdate)]
# result
> x
Name Birthdate Birthdate1
1: A 36085.0 2002-10-18
2: B 2001-sep-12 2001-09-12
3: C Feb-18-2005 2005-02-18
4: D 05/27/84 1984-05-27
5: E 2020-6-25 2020-06-25