Question: how can I convert a factor
to a date
object without getting NA
values.
Here's a similar post: Convert Factor to Date/Time in R
In that post, the user converted to a character
object before a date
. I am getting NA
values when converting to character
object using as.character
inside the as.Date
function.
I have a column in the dataframe with the date in factor format with different numbers of occurrences. Here's the information contained in the data.frame.
> head(fraud, 5)
TRANSACTION.DATE TRANSACTION.AMOUNT AIR.TRAVEL.DATE POSTING.DATE
1 2/27/14 25.00 <NA> 2/28/14
2 2/28/14 25.00 <NA> 2/28/14
3 2/27/14 25.00 <NA> 2/28/14
4 2/27/14 20.00 2/27/14 2/28/14
5 2/27/14 12.13 <NA> 2/28/14
> str(fraud$TRANSACTION.DATE)
Factor w/ 519 levels "1/1/14","1/1/15",..: 228 230 228 228 228 230 226 228 230 228 ...
> summary(fraud$TRANSACTION.DATE, 5)
9/30/14 9/17/14 11/4/14 9/23/14 (Other)
197 187 171 160 19221
Converting the factor to a date
object resulted in NA
values.
> fraud$TRANSACTION.DATE <- as.Date(as.character(fraud$TRANSACTION.DATE),
+ format = "%m/%d/%Y")
> head(fraud$TRANSACTION.DATE, 5)
[1] NA NA NA NA NA
Checking if the as.character
function worked.
> fraud$TRANSACTION.DATE <- as.character(fraud$TRANSACTION.DATE)
> head(fraud$TRANSACTION.DATE)
[1] NA NA NA NA NA NA
EDIT: I used as.Date function but got the wrong formatting
> fraud$TRANSACTION.DATE <- as.Date(fraud$TRANSACTION.DATE, format = "%m/%d/%Y")
> str(fraud$TRANSACTION.DATE)
Date[1:19936], format: "0014-02-27" "0014-02-28" "0014-02-27" "0014-02-27" "0014-02-27" ...
> head(fraud$TRANSACTION.DATE, 5)
[1] "0014-02-27" "0014-02-28" "0014-02-27" "0014-02-27" "0014-02-27"
EDIT 2: Here's the dput value
> dput(droplevels(head(fraud$TRANSACTION.DATE)))
structure(c(1L, 2L, 1L, 1L, 1L, 2L), .Label = c("2/27/14", "2/28/14"
), class = "factor")
Solution: using %y instead of %Y
> fraud$TRANSACTION.DATE <- as.Date(fraud$TRANSACTION.DATE, "%m/%d/%y")
> head(fraud$TRANSACTION.DATE, 5)
[1] "2014-02-27" "2014-02-28" "2014-02-27" "2014-02-27" "2014-02-27"
答案 0 :(得分:4)
The problem now is that your format string states the dates include the year with century where your dates only contain the year without century. You need to use the %y
placeholder, not the %Y
one.
dates <- factor(c("2/27/14","2/28/14","2/27/14","2/27/14","2/27/14"))
as.Date(dates, format = "%m/%d/%y") # correct lowercase y
as.Date(dates, format = "%m/%d/%Y") # incorrect uppercase y
> as.Date(dates, format = "%m/%d/%y")
[1] "2014-02-27" "2014-02-28" "2014-02-27" "2014-02-27" "2014-02-27"
> as.Date(dates, format = "%m/%d/%Y")
[1] "14-02-27" "14-02-28" "14-02-27" "14-02-27" "14-02-27"
Notice R gets it right when you use the correct placeholder; lowercase y.
What happens with %Y
when you don't have a year with century seems OS dependent. As you can see on Linux (Fedora 22) I get no padding of the year part whereas you are seeing zero-padding.