我想知道是否有人知道可以检测到R中日期列格式化的任何中断的包或函数,即检测日期向量格式的更改位置,如:
11/2/90
12/2/90
.
.
.
15/Feb/1990
16/Feb/1990
.
.
.
20/February/90
21/February/90
.
.
.
25/2/1990
26/2/1990
答案 0 :(得分:6)
您是否只需要检测休息时间,或者您最终是否也希望转换它们?
包guess_formats
中的lubridate
函数在两种情况下都很有用。从您的数据中查看此示例:
dates = c("11/2/90",
"12/2/90",
"15/Feb/1990",
"16/Feb/1990",
"20/February/90",
"21/February/90",
"25/2/1990",
"26/2/1990")
guess_formats(dates, order="dmy")
dmy dmy dmy dmy dmy dmy dmy dmy
"%d/%m/%y" "%d/%m/%y" "%d/%b/%Y" "%d/%b/%Y" "%d/%B/%y" "%d/%B/%y" "%d/%m/%Y" "%d/%m/%Y"
dates2 = as.Date(dates, format=guess_formats(dates, order="dmy")
dates2
[1] "1990-02-11" "1990-02-12" "1990-02-15" "1990-02-16" "1990-02-20" "1990-02-21" "1990-02-25" "1990-02-26"
答案 1 :(得分:0)
不像asb的解决方案那样优雅,使用基本R函数:
txt<-readLines(textConnection("11/2/90
12/2/90
15/Feb/1990
16/Feb/1990
20/February/90
21/February/90
25/2/1990
26/2/1990"))
#split date and check length of each element(day,month,year)
len_list<-lapply(strsplit(txt,split="/"),nchar)
[[1]]
[1] 2 1 2
[[2]]
[1] 2 1 2
[[3]]
[1] 2 3 4
[[4]]
[1] 2 3 4
[[5]]
[1] 2 8 2
[[6]]
[1] 2 8 2
[[7]]
[1] 2 1 4
[[8]]
[1] 2 1 4
#Sum of lengths of elements, difference in sum implies, format break
sum_list<-lapply(len_list,sum)
[[1]]
[1] 5
[[2]]
[1] 5
[[3]]
[1] 9
[[4]]
[1] 9
[[5]]
[1] 12
[[6]]
[1] 12
[[7]]
[1] 7
[[8]]
[1] 7
#index of format breaks
idx<-which(unlist(lapply(2:length(sum_list),function(x) unlist(sum_list[x-1])!=unlist(sum_list[x]))))
[1] 2 4 6
#dates at format breaks
txt[idx]
[1] "12/2/90" "16/Feb/1990" "21/February/90"