Question

我有一个名为“align_summary.txt”的文件，如下所示：

Left reads:


Input     :  26410324

   Mapped   :  21366875 (80.9% of input)

   of these:    451504 ( 2.1%) have multiple alignments (4372 have >20)

...more text....

... and several more lines of text....

我想在bash shell中拉出所有左对齐读取（在本例中为2.1）中的多个对齐的百分比。

如果我使用它：

 pcregrep -M "Left reads.\n..+.\n.\s+Mapped.+.\n.\s+of these" align_summary.txt | awk -F"\\\( " '{print $2}' | awk -F"%" '{print $1}' | sed -n 4p

它立即给我输出：2.1

但是，如果我将相同的表达式包含在这样的反引号中：

leftmultiple=`pcregrep -M "Left reads.\n..+.\n.\s+Mapped.+.\n.\s+of these" align_summary.txt | awk -F"\\\( " '{print $2}' | awk -F"%" '{print $1}' | sed -n 4p`

我收到错误：

awk: syntax error in regular expression (  at 
  input record number 1, file 
  source line number 1

根据我的理解，将此表达式括在反引号中会影响正则表达式的解释，包括“（”符号，尽管事实是它被反斜杠转义。

为什么会发生这种情况以及如何避免此错误？

如果有任何意见和建议，我将不胜感激。

非常感谢，

Answer 1

只需使用awk：

leftmultiple=$(awk '/these:.*multiple/{sub(" ","",$2);print $2}' FS='[(%]' align_summary.txt )

Answer 2

始终使用$(...)代替反引号，但更重要的是，只使用awk：

$ leftmultiple=$( gawk -v RS='^$' 'match($0,/Left reads.\s*\n\s+.+\n\s+Mapped.+.\n.\s+of these[^(]+[(]\s*([^)%]+)/,a) { print a[1] }' align_summary.txt )
$ echo "$leftmultiple"
2.1

以上使用GNU awk 4. *并假设您确实需要使用复杂的正则表达式来避免输入文件中其他位置的错误匹配。如果情况并非如此，那么脚本当然可以变得更简单。

Bash：反引号中的正则表达式

2 个答案: