Trouble formatting multiple date styles with lubridate

时间:2019-05-19 03:57:48

标签: r date formatting lubridate

I am formatting one column that contains the date of a record. In the column there are many formats of the date and I need to convert them into one consistent format.

I tried using lubridate()and the parse_date_time() function. I also tried with the column as a character and as a factor

This is what the date column looks like (with over 100,000 rows)

Date.of.Record 
2018-01-01     
20180102     
2018/01/03  
2018-01-04  
2018-01-05
20180106 

And id like to format them to this:

Date.of.Record 
20180101     
20180102     
20180103  
20180104  
20180105
20180106 

And this its the code I tried:

library(lubridate)
date <- parse_date_time(bind$Date.of.Record, orders =c(ymd()))
date2 <- as.Date(bind$Date.of.Record, "%yyyy-%mm-%dd")

The code for 'date" doesn't work at all and the code for 'date2' produces all NAs.

I realize that I could subset the data into different datasets by date format then combine after I format properly, but I expect there is a much more efficient way to do this. I am still new to R and try to learn the best way to work with large datasets

Thanks for your help!!!

1 个答案:

答案 0 :(得分:0)

An option would be anydate from anytime

library(anytime)
bind$Date.of.Record <- format(anydate(bind$Date.of.Record), "%Y%m%d")
bind$Date.of.Record
#[1] "20180101" "20180102" "20180103" "20180104" "20180105" "20180106"

If it needs to be numeric, wrap with as.numeric


The orders would be a string format

library(lubridate)
format(parse_date_time(bind$Date.of.Record, orders = "ymd"), "%Y%m%d")
#[1] "20180101" "20180102" "20180103" "20180104" "20180105" "20180106"

data

bind <- structure(list(Date.of.Record = c("2018-01-01", "20180102", "2018/01/03", 
 "2018-01-04", "2018-01-05", "20180106")), class = "data.frame", 
 row.names = c(NA, -6L))