如何从文件中添加/删除数字或数字范围并重新组织范围

时间:2017-09-07 12:46:18

标签: linux bash awk sed grep

如何添加/删除文件中的数字或数字范围并重新组织范围

例如在文件

$ cat test.in
cn[01-10]
cn01
cn[01,02,07-09]
cn[01-02]

要求删除cn01和cn05

期望的输出

$ cat test.in
cn[02-04,06-10]
cn[02,07-09]
cn[02]

2 个答案:

答案 0 :(得分:0)

以下是将列表和值范围扩展为单个值的方法:

$ cat tst.awk
function expand(exprStr,valsArr,        i,terms,term,range,val,numVals) {
    gsub(/cn|[][]/,"",exprStr)
    delete valsArr
    # exprStr = 01,02,07-09
    split(exprStr,terms,/,/)
    for (i=1; i in terms; i++) {
        # terms[1]=01, [2]=02, [3]=07-09
        term = terms[i]
        split(term,range,/-/)
        range[2] = (2 in range ? range[2] : range[1])
        for (val=range[1]; val<=range[2]; val++) {
            # range[1]=07, [2]=09
            valsArr[++numVals] = sprintf("%02d",val)
        }
    }
}
{
    print "--------", $0
    expand($0,arr)
    for (i=1; i<=length(arr); i++) {
        print i, "cn"arr[i]
    }
}

$ awk -f tst.awk file
-------- cn[01-10]
1 cn01
2 cn02
3 cn03
4 cn04
5 cn05
6 cn06
7 cn07
8 cn08
9 cn09
10 cn10
-------- cn01
1 cn01
-------- cn[01,02,07-09]
1 cn01
2 cn02
3 cn07
4 cn08
5 cn09
-------- cn[01-02]
1 cn01
2 cn02

现在只需从数组中删除不需要的值,然后反过来重新组合成输入格式。

答案 1 :(得分:0)

Python 3中的示例

import re
from itertools import groupby

inp = """cn[01-10]
cn01
cn[01,02,07-09]
cn[01-02]"""

rem = {1, 5}

def parse_lst(lst_str):
    for group in lst_str.split(','):
        if '-' in group:
            first, last = group.split('-')
            yield from range(int(first), int(last)+1)
        else:
            yield int(group)

def format_range(range_):
    ranges = []
    for k, g in groupby(enumerate(range_), lambda x: x[0]-x[1]):
        group = [n for i, n in g]
        ranges.append((group[0], group[-1]))

    if not ranges:
        return

    print("cn[" + ','.join(
        '{:02d}'.format(first) if first == last else
        '{:02d}-{:02d}'.format(first, last) for
        first, last in ranges
    ) + ']')

for line in inp.splitlines():
    lst_match = re.search(r'\[(.*)\]', line)
    if lst_match:
        range_ = parse_lst(lst_match.group(1))
    else:
        range_ = (int(line[2:]),)

    filtered = sorted(set(range_) - rem)
    format_range(filtered)

打印

cn[02-04,06-10]
cn[02,07-09]
cn[02]