我有从互联网收集的数据。日期列格式类似于“ 1个月前”,“ 2年前”,“ 4天前”,我需要更改此格式。
我尝试过:
googleR$`Date/Time` <- as.Date.character(googleR$`Date/Time`,
format = format(googleR$`Date/Time`),
tryFormats = c("%d/%m/%Y"))
但是它只给了我今天所有记录的日期
我也尝试了许多其他事情,但是出现了相同的错误
“字符串不是标准的明确格式”
这是我要转换的数据:
format.factor(googleR$`Date/Time`)
[1] "7 months ago " "2 months ago " "a week ago " "2 years ago " "2 years ago " "5 months ago " "10 months ago"
[8] "2 years ago " "4 years ago " "2 years ago " "2 years ago " "11 months ago" "3 years ago " "3 years ago "
[15] "2 years ago " "2 years ago " "10 months ago" "10 months ago" "a year ago " "a year ago " "2 years ago "
[22] "2 years ago " "2 years ago " "2 years ago " "2 years ago " "2 years ago " "3 years ago " "4 years ago "
[29] "4 years ago " "a week ago " "a week ago " "2 weeks ago " "a month ago " "2 months ago " "5 months ago "
[36] "7 months ago " "7 months ago " "8 months ago " "10 months ago" "10 months ago" "a year ago " "a year ago "
[43] "a year ago " "a year ago " "a year ago " "a year ago " "a year ago " "2 years ago " "2 years ago "
[50] "2 years ago " "4 years ago " "6 years ago "
答案 0 :(得分:1)
您可以使用sub
从每个元素中删除“ ago”,然后按如下所示使用lubridate
的{{1}}
add_with_rollback
结果
library(lubridate)
add_with_rollback(Sys.time(), - as.period(sub("\\s+ago", "", x)))
数据
"2019-02-28 18:13:18 CET" "2017-03-31 18:13:18 CEST" "2019-03-27 18:13:18 CET"
答案 1 :(得分:0)
您可能首先需要使用正则表达式找出您拥有的时间间隔类型,然后再从那里开始。我喜欢这样的包装袋。例如,如果IN是您的输入字符串,OUT是您想要的输出,您可以说
if(str_detect(IN, "day")){OUT <- as.numeric(str_extract(IN, "^[0-9]*"))}
现在,您已经有多少天了,您可以执行
之类的操作 Sys.Date() - OUT
获取日期。然后,您可以在数月和数年内完成基本相同的操作。不可避免地,这将是近似的,例如,并非所有月份的天数都相同,但是听起来您的输入数据并不是绝对精确。