通过awk离散到连续数字范围

时间:2017-06-26 15:35:06

标签: bash awk integer range continuous

假设文本文件file包含多个离散数字范围,每行一个。每个范围前面都有一个字符串(即范围名称)。每个范围的下限和上限由短划线分隔。每个数字范围都以分号结束。对各个范围进行排序(即,范围101-297在1299-1301之前)并且不重叠。

$cat file
foo  101-297;
bar  1299-1301;
baz  1314-5266;

请注意,在上面的示例中,三个范围不会形成从整数1开始的连续范围。

我认为 awk 是填充缺失数字范围的合适工具,因此所有范围一起形成从{1}到{最后一个范围的上限}的连续范围。如果是这样,你将用什么awk命令/函数来执行任务?

$cat file | sought_awk_command
new1 1-100;
foo  101-297;
new2 298-1298;
bar  1299-1301;
new3 1302-1313;
baz  1314-5266;

-

编辑1 :仔细评估后,下面建议的代码在另一个简单示例中失败。

$cat example2
foo  101-297;
bar  1299-1301;
baz  1302-1314; # Notice that ranges "bar" and "baz" are continuous to one another
qux  1399-5266;

$ awk -F'[ -]' '$3-Q>1{print "new"++o,Q+1"-"$3-1";";Q=$4} 1' example2
new1 1-100;
foo  101-297;
new2 298-1298;
bar  1299-1301;
baz  1302-1314;
new3 1302-1398; # ERROR HERE: Notice that range "new3" has a lower bound that is equal to upper bound of "bar", not of "baz".
qux  1399-5266;

-

编辑2:非常感谢RavinderSingh13帮助解决此问题。但是,建议的代码仍会生成与给定目标不一致的输出。

$ cat example3
foo  35025-35144;
bar  35259-35375;
baz  35376-35624;
qux  37911-39434;

$ awk -F'[ -]' '$3-Q+0>=1{print "new"++o,Q+1"-"$3-1";";Q=$4} {Q=$4;print}' example3
new1 1-35024;
foo  35025-35144;
new2 35145-35258;
bar  35259-35375;
new3 35376-35375; # ERROR HERE: Notice that range "new3" has been added, even though ranges "bar" and "baz" are contiguous.
baz  35376-35624;
new4 35625-37910;
qux  37911-39434;

3 个答案:

答案 0 :(得分:2)

尝试:

awk -F'[ -]' '$3-Q>1{print "new"++o,Q+1"-"$3-1";";Q=$4} 1'   Input_file

编辑:现在添加一个非单一的线性解决方案,并给出正确的解释。

awk -F'[ -]' '                                        ###Setting field separator as space, dash here.
                $3-Q>1{                               ###Checking here if 3rd field and variable Qs subtraction is greater than 1, if yes then perform following.
                        print "new"++o,Q+1"-"$3-1";"; ###printing the string new with a incrementing value of variable o each time, then variable Qs value with adding 1 to it, then current line $4-1 and semi colon.
                        Q=$4                          ###Assigning the variable Q value to 4th field of the current line here too.
                      }
                1                                     ###printing the current line here.
             ' Input_file                             ###Mentioning the Input_file here too.

EDIT2:根据OP的条件再添加一个答案。

 awk -F'[ -]' '$3-Q+0>=1{print "new"++o,Q+1"-"$3-1";";Q=$4} {Q=$4;print}'   Input_file

答案 1 :(得分:0)

如原始示例2中bar 1299-1301;baz 1301-1314;1301重叠时所显示的范围可以重叠,这没有问题。

$ cat tst.awk
{ split($2,curr,/[-;]/); currStart=curr[1]; currEnd=curr[2] }
currStart > (prevEnd+1) { print "new"++cnt, prevEnd+1 "-" currStart-1 ";" }
{ print; prevEnd=currEnd }

$ awk -f tst.awk file
new1 1-100;
foo  101-297;
new2 298-1298;
bar  1299-1301;
new3 1302-1313;
baz  1314-5266;

$ awk -f tst.awk example2
new1 1-100;
foo  101-297;
new2 298-1298;
bar  1299-1301;
baz  1301-1314;
new3 1315-1398;
qux  1399-5266;

$ awk -f tst.awk example3
new1 1-35024;
foo  35025-35144;
new2 35145-35258;
bar  35259-35375;
baz  35376-35624;
new3 35625-37910;
qux  37911-39434;

答案 2 :(得分:0)

$ cat file1
foo  2-100
bar  102-200
$ awk F' +|[-;}' 'p+1<$2{print "new" ++q, p+1 "-" $2-1 ";"}p=$3' file1
new1 1-1;
foo  2-100
new2 101-101;
bar  102-200
$ cat file2
foo  101-297;
bar  1299-1301;
baz  1314-5266;
$ awk -F' +|[-;]' 'p+1<$2{print "new" ++q, p+1 "-" $2-1 ";"}p=$3' file2
new1 1-100;
foo  101-297;
new2 298-1298;
bar  1299-1301;
new3 1302-1313;
baz  1314-5266;

说明:

$ awk -F' +|[-;]' '                   # FS is ; - or a bunch of spaces
p+1 < $2 {                           # if p revious $3+1 is still less than new $2
    print "new"++q,p+1 "-" $2-1 ";"  # print a "new" line
}
p=$3                                 # set future p and implicit print of record *
' file2                              # * as all values are above 0