Question

我是linux的新手，我正在尝试解析一堆看起来如下的文件 -

一些文字
- 开始列出其他一些文字
  - 启动sublist1
  - 继续sublist1
- 更多元素
- 更多元素2
  - 子列表2
    - sub-sublist1

列表前面的所有空格都是制表符。我需要一种方法来解析文本，以便为子列表添加冒号...以便它在最后看起来像下面这样：

一些文字：
- 开始列出其他一些文字：
  - 启动sublist1
  - 继续sublist1
- 更多元素
- 更多元素2：
  - 子列表2：
    - sub-sublist1
- 另一个元素

因此只有在有可用的子列表时才会添加冒号。

我尝试查看sed和awk命令，但我无法找到存储上一行状态的任何内容，以便能够在结尾处添加冒号。它不必在sed或awk中完成，我一直在尝试这些并且没有运气。任何建议都会有所帮助。

Answer 1

这样的事情可以解决你的问题：

awk '
    function countTabs(line) {
        tabs=0;
        i=0;
        while( substr(line,i++,1) == "\t")
            tabs++;
        return tabs;
     }
{
    line1 = $0;
    while( getline line2) {
        if ( countTabs(line1) < countTabs(line2))
           printf("%s:\n" , line1);
        else
           printf("%s\n",line1);
        line1 = line2;
    }
    print line2;
}'

Answer 2

要尝试的东西

awk '
{
    A[d++]=$0
    match($0,"[^[:blank:]]")
    if ( RSTART > t ){    A[d-1]=A[d-1]":"  }
    else{  gsub(/:$/,"",A[d-2])  }
    t=RSTART
}
END{
    for(i=0;i<=d;i++){
        print A[i]
    }
} ' file

输出

$ cat file
Some text
        start list some other text
                start sublist1
                continue sublist1
        more elements
        more elements2
                a sublist2
                        a sub-sublist1
                                a sub-sublist2
        another element

$ ./shell.sh
Some text:
        start list some other text:
                start sublist1
                continue sublist1
        more elements
        more elements2
                a sublist2:
                        a sub-sublist1:
                                a sub-sublist2
        another element

Answer 3

ghostdog74的脚本的修改版本应该完成工作：

awk '
{
    A[NR]=$0
    match($0,"[^[:blank:]]")
    if ( RSTART > t ){ A[NR-1]=A[NR-1]":" }
    t=RSTART
}
END{
    for(i=1; i<=NR+1; i++){
        print A[i]
    }
} ' file

用冒号分隔子列表

3 个答案: