检测日期列格式的中断/更改

时间:2014-06-11 08:25:27

标签: r date

我想知道是否有人知道可以检测到R中日期列格式化的任何中断的包或函数,即检测日期向量格式的更改位置,如:

11/2/90
12/2/90
.
.
.
15/Feb/1990
16/Feb/1990
.
.
.
20/February/90
21/February/90
.
.
.
25/2/1990
26/2/1990

2 个答案:

答案 0 :(得分:6)

您是否只需要检测休息时间,或者您最终是否也希望转换它们?

guess_formats中的lubridate函数在两种情况下都很有用。从您的数据中查看此示例:

dates = c("11/2/90",
          "12/2/90",
          "15/Feb/1990",
          "16/Feb/1990",
          "20/February/90",
          "21/February/90",
          "25/2/1990",
          "26/2/1990")

guess_formats(dates, order="dmy")
       dmy        dmy        dmy        dmy        dmy        dmy        dmy        dmy 
"%d/%m/%y" "%d/%m/%y" "%d/%b/%Y" "%d/%b/%Y" "%d/%B/%y" "%d/%B/%y" "%d/%m/%Y" "%d/%m/%Y"

dates2 = as.Date(dates, format=guess_formats(dates, order="dmy")
dates2
[1] "1990-02-11" "1990-02-12" "1990-02-15" "1990-02-16" "1990-02-20" "1990-02-21" "1990-02-25" "1990-02-26"

答案 1 :(得分:0)

不像asb的解决方案那样优雅,使用基本R函数:

txt<-readLines(textConnection("11/2/90
12/2/90
15/Feb/1990
16/Feb/1990
20/February/90
21/February/90
25/2/1990
26/2/1990"))

#split date and check length of each element(day,month,year)
len_list<-lapply(strsplit(txt,split="/"),nchar)
[[1]]
[1] 2 1 2

[[2]]
[1] 2 1 2

[[3]]
[1] 2 3 4

[[4]]
[1] 2 3 4

[[5]]
[1] 2 8 2

[[6]]
[1] 2 8 2

[[7]]
[1] 2 1 4

[[8]]
[1] 2 1 4

#Sum of lengths of elements, difference in sum implies, format break
sum_list<-lapply(len_list,sum)
[[1]]
[1] 5

[[2]]
[1] 5

[[3]]
[1] 9

[[4]]
[1] 9

[[5]]
[1] 12

[[6]]
[1] 12

[[7]]
[1] 7

[[8]]
[1] 7
#index of format breaks
idx<-which(unlist(lapply(2:length(sum_list),function(x) unlist(sum_list[x-1])!=unlist(sum_list[x]))))
[1] 2 4 6
#dates at format breaks
txt[idx]
[1] "12/2/90"        "16/Feb/1990"    "21/February/90"