关于威胁表模式日志文件的第二个问题。我正在处理位于workdir中的大量dlg文本文件的分析。每个文件都有一个以下格式的表(通常位于日志末尾):
RMSD TABLE
__________
_____________________________________________________________________
| | | | | |
Rank | Sub- | Run | Binding | Cluster | Reference | Grep
| Rank | | Energy | RMSD | RMSD | Pattern
_____|______|______|___________|_________|_________________|___________
1 1 7 -1.43 0.00 178.12 RANKING
1 2 18 -0.96 1.88 177.35 RANKING
2 1 4 -0.97 0.00 178.43 RANKING
3 1 13 -0.60 0.00 178.03 RANKING
4 1 5 -0.56 0.00 198.10 RANKING
5 1 16 +0.01 0.00 189.71 RANKING
6 1 3 +0.06 0.00 176.95 RANKING
7 1 19 +0.10 0.00 177.27 RANKING
8 1 17 +0.13 0.00 177.60 RANKING
9 1 8 +0.20 0.00 177.05 RANKING
10 1 20 +0.27 0.00 177.43 RANKING
11 1 10 +0.34 0.00 176.33 RANKING
12 1 6 +0.37 0.00 177.30 RANKING
13 1 9 +0.44 0.00 175.48 RANKING
14 1 2 +0.46 0.00 175.67 RANKING
15 1 11 +0.84 0.00 177.52 RANKING
15 2 12 +1.31 1.95 178.03 RANKING
16 1 14 +1.29 0.00 201.01 RANKING
17 1 15 +1.65 0.00 175.50 RANKING
18 1 1 +1.96 0.00 186.83 RANKING
Run time 3.909 sec
Idle time 0.817 sec
目标是遍历所有.dlg文件,并从表中获取与第一行相对应的单行(忽略标题),并省略最后一列(通常用于grep识别)。在上表的示例中,这是第三行。
1 1 7 -1.43 0.00 178.12
然后,我需要将此行与日志文件的名称(应在之前指定)一起添加到final_log.txt中。 根据我最近的经验,我的BASH工作流程(针对多个文件的威胁)的可能模型可能是:
#!/bin/bash
#name of the folder containing all *.dlg filles to be analysed
prot='7000'
#path to the folder with these *.dlg filles
FILES=$PWD/${prot}/*.dlg
#make a final log
echo 'This is a list of processed filles' > $PWD/final_results.log
# we loop over all *.dlg filles in order to extract Clustering Histogram to the final LOG file
for f in $FILES
do
file_name2=$(basename "$f")
file_name="${file_name2/.dlg}"
echo "Processing of $f..."
# here is an expression for GREP to take the line from the table and save it to >> $PWD/final_results.log
done
答案 0 :(得分:0)
如何开始-假设gawk拥有nextfile
支持:
gawk '$1~/[[:digit:]]/{ print FILENAME, substr($0,1,match($0,/[[:blank:]]+[^[:blank:]]+$/)-1);nextfile}' *.dlg