如何将正则表达式匹配组放入单独的输出列中,正确处理缺失/空值?

时间:2017-09-01 14:57:30

标签: bash grep text-processing

如果我有以下文件:

This file has two lines
This file has three lines
This file has four
This file has five lines

我想要filelines,以便我有以下输出:

file lines
file lines
file
file lines

如果每行都找到两个匹配项,则在同一行上打印匹配项。如果只找到一个,打印它,留下一个占位符(空/空白/无论如何),然后移到下一行。

我试过这样做:

grep -oP '(file)|(lines)' example.txt | paste -d ' ' - -

但我明白了:

file lines
file lines
file file
lines

因为在第三行找不到lines,它会从下一行找到file并将其放在同一输出行上。

我基本上强迫paste填充输出中的插槽,无论每行都找到什么。

我该如何更改?

2 个答案:

答案 0 :(得分:2)

我假设filelines实际上是带有自己匹配组的正则表达式。以下内容允许使用任何ERE:

#!/usr/bin/env bash

# replace these with any ERE-compliant regex of your choice
file_re='(file)'    # for instance: file_re='file=([^[:space:]]+)([[:space]]|$)'
lines_re='(lines)'

while IFS= read -r line; do
  # default to a blank placeholder if no matches exist
  file= lines=

  # compare against each regex; if one matches, assign the group contents to a variable
  [[ $line =~ $file_re ]] && file=${BASH_REMATCH[1]}
  [[ $line =~ $lines_re ]] && lines=${BASH_REMATCH[1]}

  # print a line of output if *either* regex matched.
  [[ $file || $lines ]] && printf '%s\t%s\n' "$file" "$lines"

done <"${1:-example.txt}" # with input from $1 if given, or example.txt otherwise

请参阅BashFAQ #1&#34;如何逐行(和/或逐字段)读取文件(数据流,变量)?&#34; < / em>)有关此处使用的技术的描述。

根据您的输入,输出为:

file    lines
file    lines
file
file    lines

答案 1 :(得分:0)

sed用于<link href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet"/> <link href="http://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css" rel="stylesheet"/> <nav class="navbar navbar-fixed-top conversationHeader headerOnScroll"> <div class="container-fluid"> <div class="navbar-header horizontalLayout"> <a class="navbar-brand text-center conversationBackButton"> <span class="ionicons ion-android-arrow-back"></span> </a> <div class="conversationDetails"> <div>John Doe</div> <div class="composeMessageContainer"> Text </div> </div> <img class="img-circle img-responsive avatar" src="images/dp.png"> </div> </div> </nav>,grep用于s/old/new/。对于任何其他文本操作,您应该使用awk。

使用GNU awk为第3个arg匹配():

g/re/p

使用其他awks,您可以使用substr()来捕获匹配的字符串:

$ awk '{f=match($0,/file/,a); f+=match($0,/lines/,b)} f{print a[0], b[0]}' file
file lines
file lines
file
file lines