我有一个带有分隔整数的文件,我从其他地方提取并转储到文件中。有些行包含一个范围,如下所示:
Files 1,2,3,4,5,6,7,8,9,10 are OK
Users 1,2,3-9,10 have problems
Cars 1-5,5-10 are in the depot
Trains 1-10 are on time
是否可以扩展文本文件的范围,以便返回每个单独的数字,并保留分隔符?整数的任何一边的文本可以是任何东西,我需要保留它。
Files 1,2,3,4,5,6,7,8,9,10 are OK
Uses 1,2,3,4,5,6,7,8,9,10 have problems
Cars 1,2,3,4,5,6,7,8,9,10 are in the depot
Trains 1,2,3,4,5,6,7,8,9,10 are on time
我想这可以用awk相对容易地完成,更不用说任何其他脚本语言了。非常感谢任何帮助
答案 0 :(得分:3)
您还没有使用perl标记,但在这种情况下我建议使用它:
perl -pe 's/(\d+)-(\d+)/join(",", $1..$2)/ge' file
这将替换所有出现的一个或多个数字,后跟连字符,后跟一个或多个数字。它使用它捕获的数字来创建从第一个数字到第二个数字的列表,并以逗号连接列表。
此处需要e
修饰符,以便可以在替换的替换部分中计算表达式。
为避免重复值并对列表进行排序,事情会变得复杂一些。在这一点上,我建议使用脚本,而不是单行:
use strict;
use warnings;
use List::MoreUtils qw(uniq);
while (<>) {
s/(\d+)-(\d+)/join(",", $1..$2)/ge;
if (/(.*\s)((\d+,)+\d+)(.*)/) {
my @list = sort { $a <=> $b } uniq split(",", $2);
$_ = $1 . join(",", @list) . $4 . "\n";
}
} continue {
print;
}
扩展范围后(如在单行中),我重新解析了该行以提取值列表。我已使用uniq
(核心模块)中的List::MoreUtils
删除所有重复项并对值进行排序。
像perl script.pl file
一样调用脚本。
答案 1 :(得分:0)
使用awk
的解决方案:
{
result = "";
count = split($0, fields, /[ ,-]+/, seps);
for (i = 1; i <= count; i++) {
if (fields[i] ~ /[0-9]+/) {
if (seps[i] == ",") {
numbers[fields[i]] = fields[i];
} else if (seps[i] == "-") {
for (j = fields[i] + 1; j <= fields[i+1]; j++) {
numbers[j] = j;
}
} else if (seps[i] == " ") {
numbers[fields[i]] = fields[i];
c = asort(numbers);
for (r = 1; r < c; r++) {
result = result numbers[r] ",";
}
result = result numbers[c] " ";
}
} else {
result = result fields[i] seps[i];
}
}
print result;
}
答案 2 :(得分:0)
$ cat tst.awk
match($0,/[0-9,-]+/) {
split(substr($0,RSTART,RLENGTH),numsIn,/,/)
numsOut = ""
delete seen
for (i=1;i in numsIn;i++) {
n = split(numsIn[i],range,/-/)
for (j=range[1]; j<=range[n]; j++) {
if ( !seen[j]++ ) {
numsOut = (numsOut=="" ? "" : numsOut ",") j
}
}
}
print substr($0,1,RSTART-1) numsOut substr($0,RSTART+RLENGTH)
}
$ awk -f tst.awk file
Files 1,2,3,4,5,6,7,8,9,10 are OK
Users 1,2,3,4,5,6,7,8,9,10 have problems
Cars 1,2,3,4,5,6,7,8,9,10 are in the depot
Trains 1,2,3,4,5,6,7,8,9,10 are on time
答案 3 :(得分:0)
另一个awk
$ awk '{while(match($0, /[0-9]+-[0-9]+/))
{k=substr($0, RSTART, RLENGTH);
split(k,a,"-");
f=a[1];
for(j=a[1]+1; j<=a[2]; j++) f=f","j;
sub(k,f)}}1' file
Files 1,2,3,4,5,6,7,8,9,10 are OK
Users 1,2,3,4,5,6,7,8,9,10 have problems
Cars 1,2,3,4,5,5,6,7,8,9,10 are in the depot
Trains 1,2,3,4,5,6,7,8,9,10 are on time
请注意,由于范围重叠,Cars 1-5,5-10
在展开时最终会有两个5
值。