我有一个数据框,其中包含一个列调用“日期”。但是日期格式明显不同。数据类型是字符串。我正在尝试从该数据列创建“月”,“年”和“星期几”列。
dataid date
1 Tue 11/3
2 Wed 11/4
3 N/A
4 Monday, February 1, 2016
5 Thursday, March 25, 2015
做到这一点的最佳方法是什么?
答案 0 :(得分:1)
可靠的方法是使用lubridate::parse_date_time()
,但是可能会错误地解析年份以外的日期(您可能需要手动对其进行编辑)。
您可以阅读“ help(“ strptime”)”,以了解有关如何格式化订单以解析您的日期的更多信息。
p.s。。2015年3月25日是星期三,而不是示例数据中的星期四。
library(dplyr)
library(lubridate)
df <- data.table::fread(
"dataid date
1 'Tue 11/3'
2 'Wed 11/4'
3 'N/A'
4 'Monday, February 1, 2016'
5 'Thursday, March 25, 2015'
",quote="\'")
df.new <- df %>%
mutate(
date2 =lubridate::parse_date_time(x =date, orders = c("%a %m/%d", "%A, %B %d, %Y"))
)
#> Warning: 1 failed to parse.
df.new
#> dataid date date2
#> 1 1 Tue 11/3 2018-11-03
#> 2 2 Wed 11/4 2018-11-04
#> 3 3 N/A <NA>
#> 4 4 Monday, February 1, 2016 2016-02-01
#> 5 5 Thursday, March 25, 2015 2015-03-25
由reprex package(v0.2.1)于2018-10-08创建
您可以从中提取年,月,日,如下所示:
df.new %>%
mutate(
year = lubridate::year(date2),
month = lubridate::month(date2),
day_of_week = weekdays(date2)
)
# dataid date date2 year month day_of_week
#1 1 Tue 11/3 2018-11-03 2018 11 Saturday
#2 2 Wed 11/4 2018-11-04 2018 11 Sunday
#3 3 N/A <NA> NA NA <NA>
#4 4 Monday, February 1, 2016 2016-02-01 2016 2 Monday
#5 5 Thursday, March 25, 2015 2015-03-25 2015 3 Wednesday
答案 1 :(得分:0)
如果将日期和月份写为字符,则可以在dplyr::case_when()
调用中使用正则表达式:
library(dplyr)
df <- df %>%
mutate(
day_of_the_week = case_when(
grepl("mon", date, ignore.case = T) ~ "mon",
grepl("tue", date, ignore.case = T) ~ "tues",
grepl("wed", date, ignore.case = T) ~ "wed",
grepl("thu", date, ignore.case = T) ~ "thurs",
grepl("fri", date, ignore.case = T) ~ "fri",
grepl("sat", date, ignore.case = T) ~ "sat",
grepl("sun", date, ignore.case = T) ~ "sun",
T ~ NA_character_
),
month = case_when(
grepl("jan", date, ignore.case = T) ~ "jan",
grepl("feb", date, ignore.case = T) ~ "feb",
grepl("mar", date, ignore.case = T) ~ "mar",
grepl("apr", date, ignore.case = T) ~ "apr",
grepl("may", date, ignore.case = T) ~ "may",
grepl("jun", date, ignore.case = T) ~ "jun",
grepl("jul", date, ignore.case = T) ~ "jul",
grepl("aug", date, ignore.case = T) ~ "aug",
grepl("sep", date, ignore.case = T) ~ "sep",
grepl("oct", date, ignore.case = T) ~ "oct",
grepl("nov", date, ignore.case = T) ~ "nov",
grepl("dec", date, ignore.case = T) ~ "dec",
T ~ NA_character_
)
)
# dataid date day_of_the_week month
# 1 1 Tue 11/3 tues <NA>
# 2 2 Wed 11/4 wed <NA>
# 3 3 <NA> <NA> <NA>
# 4 4 Monday, February 1, 2016 mon feb
# 5 5 Thursday, March 25, 2015 thurs mar
要提取日期/月份的数字比较困难(您可能以类似的方式在13到31之间的月份中的某几天进行提取,但是否则无法知道该数字是日期还是月份)。 / p>
df <- read.table(text = "
dataid date
1 'Tue 11/3'
2 'Wed 11/4'
3 N/A
4 'Monday, February 1, 2016'
5 'Thursday, March 25, 2015'",
header = T,
stringsAsFactors = F,
na.strings = "N/A")