Question

我有一个格式为

的文件

file header string(s)
"section title" : [status]
unknown
text

"next section" : [different_status]
different
amount of

strings

我想将其分解为

等部分

file header string(s)

和

"section title" : [status]
unknown
text

和

"next section" : [different_status]
different
amount of

strings

尽管捕获该标题字符串并不重要。

如您所见，我可以依赖的模式进行拆分

"string in quotes" : [string in square brackets]

还需要捕获此分隔字符串。

在bash脚本中执行此操作的简单方法是什么？我预测awk会做些什么，但我的awk-fu很弱。

Answer 1

Perl解决方案：

#!/usr/bin/perl
use warnings;
use strict;

my $output = 0;
open my $OUT, '>', "section-$output" or die $!;
while (<>) {
    if (/"[^"]*" : \[[^\]]*\]/) {
        $output++;
        open $OUT, '>', "section-$output" or die $!;
    }
    print {$OUT} $_;
}

Answer 2

这就是纯粹的Bash中的技巧：

#!/bin/bash

while read line; do
    [[ "$line" =~ "^\"[^\"]*\" : \[[^]]*\]" ]] && i=$(( ++i ))
    [[ $i > 0 ]] && echo "SECTION_$i: " $line
done < $1

更新：改进的正则表达式。

Answer 3

应该是awk中的单行。假设我正确地解释你的潜水线，那么这个呢？

awk '/^"[^"]+" : \[[^]]+\]$/ { printf("\n"); } 1' inputfile > outputfile

末尾的“1”是一个“打印当前行”的快捷方式。如果当前行与模式匹配，则它之前的条件和表达式对将插入空白。

你可以在sed单行中做同样的事情：

sed -r '/^"[^"]+" : \[[^]]+\]$/{x;p;x;}' inputfile > outputfile

这使用了sed的“持有空间”的魔力。您可以man sed了解x如何运作的详细信息。

使用bash和/或awk在模式上拆分字符串

3 个答案: