awk next和模式匹配

时间:2015-03-19 00:19:55

标签: bash awk next

如果我们有以下csv文件,我们只想在“DELTA Energy Terns”部分获得$ 9,不包括以“Frame”开头的行

Ligand Energy Terms
Frame #,VDWAALS,EEL,EGB,ESURF,ESCF,G gas,G solv,TOTAL
0,0.0,0.0,-37.2465,2.70257904,98.8916,0.0,-34.54392096,64.34767904
1,0.0,0.0,-33.1958,2.71419624,80.6403,0.0,-30.48160376,50.15869624

DELTA Energy Terms
Frame #,VDWAALS,EEL,EGB,ESURF,ESCF,DELTA G gas,DELTA G solv,DELTA TOTAL
0,-43.3713,0.0,44.4036,-5.24443392,-27.4605,-43.3713,39.15916608,-31.67263392
1,-43.7597,0.0,37.343,-5.1764544,-23.3471,-43.7597,32.1665456,-34.9402544
2,-42.5618,0.0,44.0748,-5.2738956,-26.6719,-42.5618,38.8009044,-30.4327956
3,-43.1034,0.0,41.3681,-5.25029544,-27.1501,-43.1034,36.11780456,-34.13569544

期望的输出:

-31.6726
-34.9402
-30.4327
-34.1356

以下尝试将打印所有9美元,包括“配体能源条款”部分中的9美元。

awk -F, '$1 ~ /DELTA Energy Terms/ {next} $1 ~ /Frame/ {next} {printf("%24.4f\n",$9)}'

awk -F, '$1 ~ /DELTA Energy Terms/ {next}  {printf("%24.4f\n",$9)}'

任何一位大师能开导吗?

4 个答案:

答案 0 :(得分:1)

以下应该做的伎俩

awk -F, '/^DELTA/ {capture=1} /Energy Terms$/ {next} /^Frame/ {next} (capture) {print $9}'

我使用capture标志来控制是否应捕获单个记录。默认情况下,capture为零。解析DELTA Energy Terms行后,我开始捕获。我会跳过以Energy Terms结尾或以Frame开头的所有行。否则,如果我们是"捕获",那么我带出第九个元素。

如果您经常使用此脚本,我建议您使用以下脚本:

#!/usr/bin/awk -f
BEGIN {
    FS = ","
}
/^DELTA Energy Terms/ {
    capture = 1;
    next
}
/Energy Terms$/ {
    capture = 0;
    next
}
/^Frame/ { next }
(capture) { print $9 }

将脚本保存为extract-delta并使其可执行,然后您可以像使用任何其他shell命令一样使用它:

$ cat input-file | tr -d '\015' | ./extract-delta
-31.67263392
-34.9402544
-30.4327956
-34.13569544

答案 1 :(得分:0)

你可以尝试下面的awk命令。

$ awk -v RS="\n\n" -v FS="\n" '/^DELTA Energy Terms/{for(i=3;i<=NF;i++){split($i, a, /,/);print a[9]}}' RS=  file
-31.67263392
-34.9402544
-30.4327956
-34.13569544
  • RS="\n\n",因此空白行设置为记录分隔符。
  • FS="\n",换行符设置为字段分隔符。
  • /^DELTA Energy Terms/如果记录以^DELTA Energy Terms开头,则对该特定记录执行以下操作。
  • {for(i=3;i<=NF;i++){split($i, a, /,/);print a[9]}}遍历除1和2之外的所有字段,然后根据逗号分割每个字段,然后将吐出的项目存储到名为a的数组中。
  • print a[9]在关联数组a中的第9个索引处打印元素。

答案 2 :(得分:0)

您也可以使用以下方法使用bash完成此操作:

tail -n +$((2 + $(grep -n "DELTA Energy Terms" input.txt | cut -d":" -f1) )) input.txt | cut -d"," -f9

tail -n +$((2 + $(grep -n "DELTA Energy Terms" input.txt部分将打印输入文件的行,从包含 DELTA Energy Terms 加2的行开始,然后cut将为您提供第9个字段你正在寻找。

答案 3 :(得分:0)

所有这些解决方案都有效,因此解决了眼前的问题,但没有人回答隐含的问题。

要查看有问题的命令,为什么这不起作用?

'$1 ~ /DELTA Energy Terms/ {next} $1 ~ /Frame/ {next} {printf("%24.4f\n",$9)}

让我们分解一下。

# Skip every line where the first field matches. 
$1 ~ /DELTA Energy Terms/ {next} 
  # No line matches this criteria, so this has no effect. 
  # Explanation: The field separator isn't set, so defaults to breaking fields on white space. 
  # If you print out the first field, you will see "DELTA" on this line, not "DELTA Energy Terms".

# Skip every line where the first field matches "Frame". 
$1 ~ /Frame/ {next} 
  # This matches and gets skipped.

# Print every line that didn't get skipped.
{printf("%24.4f\n",$9)}
  # The two "Energy Terms" title lines don't have any entries in field 9, 
  # so it prints blanks for those lines.