扩展文件中的数字范围

时间:2016-06-20 08:32:03

标签: bash shell awk

我有一个带有分隔整数的文件,我从其他地方提取并转储到文件中。有些行包含一个范围,如下所示:

Files 1,2,3,4,5,6,7,8,9,10 are OK
Users 1,2,3-9,10 have problems
Cars 1-5,5-10 are in the depot
Trains 1-10 are on time

是否可以扩展文本文件的范围,以便返回每个单独的数字,并保留分隔符?整数的任何一边的文本可以是任何东西,我需要保留它。

Files 1,2,3,4,5,6,7,8,9,10 are OK
Uses 1,2,3,4,5,6,7,8,9,10 have problems
Cars 1,2,3,4,5,6,7,8,9,10 are in the depot
Trains 1,2,3,4,5,6,7,8,9,10 are on time

我想这可以用awk相对容易地完成,更不用说任何其他脚本语言了。非常感谢任何帮助

4 个答案:

答案 0 :(得分:3)

您还没有使用标记,但在这种情况下我建议使用它:

perl -pe 's/(\d+)-(\d+)/join(",", $1..$2)/ge' file

这将替换所有出现的一个或多个数字,后跟连字符,后跟一个或多个数字。它使用它捕获的数字来创建从第一个数字到第二个数字的列表,并以逗号连接列表。

此处需要e修饰符,以便可以在替换的替换部分中计算表达式。

为避免重复值并对列表进行排序,事情会变得复杂一些。在这一点上,我建议使用脚本,而不是单行:

use strict;
use warnings;
use List::MoreUtils qw(uniq);

while (<>) {
    s/(\d+)-(\d+)/join(",", $1..$2)/ge;
    if (/(.*\s)((\d+,)+\d+)(.*)/) {
        my @list = sort { $a <=> $b } uniq split(",", $2);
        $_ = $1 . join(",", @list) . $4 . "\n";
    }
} continue {
    print;
}

扩展范围后(如在单行中),我重新解析了该行以提取值列表。我已使用uniq(核心模块)中的List::MoreUtils删除所有重复项并对值进行排序。

perl script.pl file一样调用脚本。

答案 1 :(得分:0)

使用awk的解决方案:

{
    result = "";
    count = split($0, fields, /[ ,-]+/, seps);
    for (i = 1; i <= count; i++) {
        if (fields[i] ~ /[0-9]+/) {
            if (seps[i] == ",") {
                numbers[fields[i]] = fields[i];
            } else if (seps[i] == "-") {
                for (j = fields[i] + 1; j <= fields[i+1]; j++) {
                    numbers[j] = j;
                }
            } else if (seps[i] == " ") {
                numbers[fields[i]] = fields[i];
                c = asort(numbers);
                for (r = 1; r < c; r++) {
                    result = result numbers[r] ",";
                }
                result = result numbers[c] " ";
            }
        } else {
            result = result fields[i] seps[i];
        }
    }
    print result;
}

答案 2 :(得分:0)

$ cat tst.awk
match($0,/[0-9,-]+/) {
    split(substr($0,RSTART,RLENGTH),numsIn,/,/)
    numsOut = ""
    delete seen
    for (i=1;i in numsIn;i++) {
        n = split(numsIn[i],range,/-/)
        for (j=range[1]; j<=range[n]; j++) {
            if ( !seen[j]++ ) {
                numsOut = (numsOut=="" ? "" : numsOut ",") j
            }
        }
    }
    print substr($0,1,RSTART-1) numsOut substr($0,RSTART+RLENGTH)
}

$ awk -f tst.awk file
Files 1,2,3,4,5,6,7,8,9,10 are OK
Users 1,2,3,4,5,6,7,8,9,10 have problems
Cars 1,2,3,4,5,6,7,8,9,10 are in the depot
Trains 1,2,3,4,5,6,7,8,9,10 are on time

答案 3 :(得分:0)

另一个awk

$ awk '{while(match($0, /[0-9]+-[0-9]+/))
          {k=substr($0, RSTART, RLENGTH); 
           split(k,a,"-"); 
           f=a[1]; 
           for(j=a[1]+1; j<=a[2]; j++) f=f","j; 
           sub(k,f)}}1' file

Files 1,2,3,4,5,6,7,8,9,10 are OK
Users 1,2,3,4,5,6,7,8,9,10 have problems
Cars 1,2,3,4,5,5,6,7,8,9,10 are in the depot
Trains 1,2,3,4,5,6,7,8,9,10 are on time

请注意,由于范围重叠,Cars 1-5,5-10在展开时最终会有两个5值。