Question

关于威胁表模式日志文件的第二个问题。我正在处理位于workdir中的大量dlg文本文件的分析。每个文件都有一个以下格式的表（通常位于日志末尾）：

 RMSD TABLE
    __________


_____________________________________________________________________
     |      |      |           |         |                 |
Rank | Sub- | Run  | Binding   | Cluster | Reference       | Grep
     | Rank |      | Energy    | RMSD    | RMSD            | Pattern
_____|______|______|___________|_________|_________________|___________
   1      1      7       -1.43      0.00    178.12           RANKING
   1      2     18       -0.96      1.88    177.35           RANKING
   2      1      4       -0.97      0.00    178.43           RANKING
   3      1     13       -0.60      0.00    178.03           RANKING
   4      1      5       -0.56      0.00    198.10           RANKING
   5      1     16       +0.01      0.00    189.71           RANKING
   6      1      3       +0.06      0.00    176.95           RANKING
   7      1     19       +0.10      0.00    177.27           RANKING
   8      1     17       +0.13      0.00    177.60           RANKING
   9      1      8       +0.20      0.00    177.05           RANKING
  10      1     20       +0.27      0.00    177.43           RANKING
  11      1     10       +0.34      0.00    176.33           RANKING
  12      1      6       +0.37      0.00    177.30           RANKING
  13      1      9       +0.44      0.00    175.48           RANKING
  14      1      2       +0.46      0.00    175.67           RANKING
  15      1     11       +0.84      0.00    177.52           RANKING
  15      2     12       +1.31      1.95    178.03           RANKING
  16      1     14       +1.29      0.00    201.01           RANKING
  17      1     15       +1.65      0.00    175.50           RANKING
  18      1      1       +1.96      0.00    186.83           RANKING

Run time 3.909 sec
Idle time 0.817 sec

目标是遍历所有.dlg文件，并从表中获取与第一行相对应的单行（忽略标题），并省略最后一列（通常用于grep识别）。在上表的示例中，这是第三行。

      1      1      7       -1.43      0.00    178.12

然后，我需要将此行与日志文件的名称（应在之前指定）一起添加到final_log.txt中。根据我最近的经验，我的BASH工作流程（针对多个文件的威胁）的可能模型可能是：

#!/bin/bash
#name of the folder containing all *.dlg filles to be analysed
prot='7000'
#path to the folder with these *.dlg filles
FILES=$PWD/${prot}/*.dlg
#make a final log
echo 'This is a list of processed filles' > $PWD/final_results.log
# we loop over all *.dlg filles in order to extract Clustering Histogram to the final LOG file
for f in $FILES
do
  file_name2=$(basename "$f")
  file_name="${file_name2/.dlg}"
  echo "Processing of $f..."
  # here is an expression for GREP to take the line from the table and save it to >> $PWD/final_results.log
done

Answer 1

如何开始-假设gawk拥有nextfile支持：

gawk '$1~/[[:digit:]]/{ print FILENAME, substr($0,1,match($0,/[[:blank:]]+[^[:blank:]]+$/)-1);nextfile}' *.dlg

bash从表中提取第一行

1 个答案: