我有一个要使用bash处理的文件。可以与awk,sed或grep或类似版本一起使用。该文件在一行上出现多次。我想提取这两个事件之间的所有内容,并将输出分别打印在单独的行上。
我已经尝试使用此功能
cat file.txt | grep -o 'pattern1.*pattern2'
但这将打印从pattern1到最后一个匹配的pattern2的所有内容。
$ cat file.txt
pattern1 this is the first content pattern2 this is some other stuff pattern1 this is the second content pattern2 this is the end of the file.
我想得到:
pattern1 this is the first content pattern2
pattern1 this is the second content pattern2
答案 0 :(得分:1)
这可能对您有用(GNU sed):
sed -n '/pattern1.*pattern2/{s/pattern1/\n&/;s/.*\n//;s/pattern2/&\n/;P;D}' file
将选项-n
设置为显式打印。
仅包含pattern1
后跟pattern2
的处理行。
将换行符添加到pattern1
。
删除并包括引入的换行符。
在pattern2
之后添加换行符。
在图案空间中打印第一行,将其删除并重复。
答案 1 :(得分:0)
尝试gnu sed:
sed -E 's/(pattern2).*(pattern1)(.*\1).*/\1\n\2\3/' file.txt
答案 2 :(得分:0)
如果您无权访问支持环视的工具,则这种方法虽然冗长,但可以在任何UNIX机器上使用标准工具来可靠地工作:
awk '{
gsub(/@/,"@A"); gsub(/{/,"@B"); gsub(/}/,"@C"); gsub(/pattern1/,"{"); gsub(/pattern2/,"}")
out = ""
while( match($0,/{[^{}]*}/) ) {
out = (out=="" ? "" : out ORS) substr($0,RSTART,RLENGTH)
$0 = substr($0,RSTART+RLENGTH)
}
$0 = out
gsub(/}/,"pattern2"); gsub(/{/,"pattern1"); gsub(/}/,"@C"); gsub(/{/,"@B"); gsub(/@A/,"@")
} 1' file
以上方法通过创建输入中不存在的字符来工作(首先将那些字符{
和}
更改为其他字符串@B
和@C
)因此它可以使用否定字符类中的那些字符来查找目标字符串,然后将所有更改的字符返回其原始值。这是一些印刷品,可以使每个步骤中发生的事情更加明显:
awk '{
print "1): " $0 ORS
gsub(/@/,"@A"); gsub(/{/,"@B"); gsub(/}/,"@C"); gsub(/pattern1/,"{"); gsub(/pattern2/,"}")
print "2): " $0 ORS
out = ""
while( match($0,/{[^{}]*}/) ) {
out = (out=="" ? "" : out ORS) substr($0,RSTART,RLENGTH)
$0 = substr($0,RSTART+RLENGTH)
}
$0 = out
print "3): " $0 ORS
gsub(/}/,"pattern2"); gsub(/{/,"pattern1"); gsub(/}/,"@C"); gsub(/{/,"@B"); gsub(/@A/,"@")
print "4): " $0 ORS
} 1' file
1): pattern1 this is the first content pattern2 this is some other stuff pattern1 this is the second content pattern2 this is the end of the file.
2): { this is the first content } this is some other stuff { this is the second content } this is the end of the file.
3): { this is the first content }
{ this is the second content }
4): pattern1 this is the first content pattern2
pattern1 this is the second content pattern2
pattern1 this is the first content pattern2
pattern1 this is the second content pattern2