Question

我正在尝试从字符串中提取匹配组-我使用Rubular提出了一种模式：

\[(.*?)\]

红宝石色似乎为以下字符串提取了预期的组：

1547156981784：将[this_is_a_test]和[this_is_a_test]设置为[12ms]大小[385B]，并通过[5.6.7.8]为[1.2.3.4]使用[http://barcodeapi.org/index.html]

1: Code128
2: this_is_a_test
3: 12ms
4: 385B
5: http://barcodeapi.org/index.html
6: 1.2.3.4
7: 5.6.7.8

但是，这个问题是我试图在Bash脚本中实现此正则表达式以解析日志文件：

reg='\[(.*?)\]'
while read line; do
  if [[ $line =~ $reg ]]; then
    echo ${BASH_REMATCH[1]};
  fi
done < $log

但是结果与红宝石/红宝石不同；在Bash中，匹配组＃1包含整个字符串，减去第一个和最后一个括号；对于同一条日志行，bash仅返回单个匹配项：

1: Code128 ] with [ this_is_a_test ] in [ 12ms ] size [ 385B ] using [ http://barcodeapi.org/index.html ] for [ 1.2.3.4 ] via [ 5.6.7.8

问题存在

两个引擎为什么给出不同的结果？如何使用Bash正确分离组？

Answer 1

几个问题：

Bash中没有全局匹配项；
您需要在Bash中手动循环多个匹配项并手动管理字符串索引；
Bash正则表达式中没有使用ERE中的非贪婪量词，因此.*?与Ruby中的工作原理不同。

您可以以此为起点：

while read line; do
    while [[ $line =~ ([^\[]*)\[([^\]]*)\] ]]; do 
        i=${#BASH_REMATCH}
        line=${line:i}
        echo "${BASH_REMATCH[2]}"
    done
done < file

打印：

 Code128 
 this_is_a_test 
 12ms 
 385B 
 http://barcodeapi.org/index.html 
 1.2.3.4 
 5.6.7.8

如果您仅使用Perl / GNU grep / Ruby / etc创建匹配列表，然后使用Bash遍历那，那么您会减轻 way 的麻烦：

while read m; do
    echo "Match: $m"
done < <(ggrep -oP '(?<=\[)(.*?)(?=\])' file)  # GNU grep is ggrep here

如果您的代码必须是POSIX，请使用awk：

$ awk -v RS=[ -v FS=] 'NR>1{print $1}' file

bash中的regex返回的结果与ruby不同

1 个答案: