寻找连续范围

时间:2016-04-15 10:30:22

标签: bash awk

我想找到一天中给出一组日期的连续范围

给出以下样本

2016-01-01
2016-01-02
2016-01-03
2016-01-04
2016-01-05
2016-01-06
2016-01-08
2016-01-09
2016-01-10
2016-01-11
2016-01-12
2016-01-15
2016-01-16
2016-01-17
2016-01-20
2016-01-21
2016-01-30
2016-01-31
2016-02-01

我希望得到以下结果

2016-01-01-2016-01-06
2016-01-08-2016-01-12
2016-01-15-2016-01-17
2016-01-20-2016-01-21
2016-01-30-2016-01-31
2016-02-01-2016-02-01

我已经遇到this问题,几乎与我想要的问题相反,但整数问题。 我已经制定了以下与整数一起使用的方法。

awk 'NR==1 {l=$1; n=$1} {if ($1==n){n=$1+1} else{print l"-"n-1; l=$1 ;n=$1+1} } END {print l"-"$1}' file.txt

3 个答案:

答案 0 :(得分:0)

If you have GNU Awk you can use its time functions.

gawk -F - 'NR==1 || $1 "-" $2 "-" $3 != following {
    if (following != "") print start "-" latest;
    start = $1 "-" $2 "-" $3
    this = mktime($1 " " $2 " " $3 " 0 0 0")
  }
  {
    this += 24*60*60
    following = strftime("%F", this)
    latest = $1 "-" $2 "-" $3 }
  END { if (start != latest) print start "-" latest }' filename

Unit ranges will print like "2016-04-15-2016-04-15" which is a bit of a wart, but easy to fix if you need to. Also the END block has a bug in this case, but again, this should at least get you started.

答案 1 :(得分:0)

GAWK:

#!/bin/awk -f
BEGIN{
        FS="-"
}
{
        a[NR]=mktime($1" "$2" "$3" 0 0 0")
        b[NR]=$2;
        if ( (a[NR-1]+86400) != a[NR] || b[NR-1]!=b[NR] ) {
                if(NR!=1){
                        print s" - "strftime("%Y-%m-%d",a[NR-1])
                };
                s=$0
        }
}
END{
        print s" - "$0
}

使用awk time function a创建索引为NR的数组mktime,并将值作为从$ 0派生的纪元时间。

索引为b的数组NR,其值为$2中的月份 如果最后一行+ 86400(+1天)的纪元时间不等于上一行中当前行或月份的纪元时间,当前行不同,除第一行外,s" - "strftime("%Y-%m-%d",a[NR-1]中的打印值并重新分配{{ 1}}这是s

的开始日期

END: 打印上次开始时间$0和最后一行

答案 2 :(得分:0)

使用mktime()的GNU awk:

$ cat tst.awk
BEGIN { FS=OFS="-" }
{ currSecs = mktime( $1" "$2" "$3" 0 0 0" ) }
(currSecs - prevSecs) > (24*60*60) {
    if (NR>1) {
        print startDate, prevDate
    }
    startDate = $0
}
{ prevSecs = currSecs; prevDate = $0 }
END { print startDate, prevDate }

$ awk -f tst.awk file
2016-01-01-2016-01-06
2016-01-08-2016-01-12
2016-01-15-2016-01-17
2016-01-20-2016-01-21
2016-01-30-2016-02-01

如果您不关心月份变化时重新启动的范围(如您的预期输出和问题中的评论所示),请使用任何awk:

$ cat tst.awk
BEGIN { FS=OFS="-" }
{ currYrMth = $1 FS $2; currDay = $3 }
(currYrMth != prevYrMth) || ((currDay - prevDay) > 1) {
    if (NR>1) {
        print startDate, prevDate
    }
    startDate = $0
}
{ prevYrMth = currYrMth; prevDay = currDay; prevDate = $0 }
END { print startDate, prevDate }

$ awk -f tst.awk file
2016-01-01-2016-01-06
2016-01-08-2016-01-12
2016-01-15-2016-01-17
2016-01-20-2016-01-21
2016-01-30-2016-01-31
2016-02-01-2016-02-01