假设我有一个文件,它只是非常相似的块的重复(下面显示了简化的示例)。提取某些块并将其写入单独文件的最快方法是什么?所有块均以相同的数字\ n开头。输入文件可以有超过一百万步,每个块可以有数千个原子。因此,由于我只需要有限数量的步骤(例如,每1000个步骤),所以我不想读(巨大)文件或对其进行完全循环。我正在考虑bash脚本编制(sed或带分组的头),python(内存映射和使用正则表达式存储块)或awk(Write blocks in a text file to multiple new files)。有我不知道的任何方法或语言吗? 谢谢
6
step 1
C 9.0000000 8.3380808 9.0000001
C 9.0000000 9.6619194 8.9999999
H 8.0768455 7.7678700 9.0000001
H 9.9231545 10.2321301 9.0000001
H 8.0768455 10.2321301 9.0000001
H 9.9231545 7.7678700 9.0000001
6
step 2
C 9.00000000 8.33808080 9.00000010
C 9.00000000 9.66191940 8.99999990
H 8.07684550 7.76787000 9.00000010
H 9.90912982 10.23213008 8.83969637
H 8.09087028 10.23213012 9.16030383
H 9.92315450 7.76787000 9.00000010
6
step 3
C 9.00000000 8.33808080 9.00000010
C 9.00000000 9.66191940 8.99999990
H 8.07684550 7.76787000 9.00000010
H 9.86748170 10.23213006 8.68426301
H 8.13251850 10.23213014 9.31573717
H 9.92315450 7.76787000 9.00000010
答案 0 :(得分:0)
我在awk
中写了一个小的POC。这接近您想要的东西吗?
awk '
/^[0-9]/ { print "skipping " $0; next; }
/step / { fn = sprintf("%s.%s", $1, $2); print "assigned fn = ", fn; }
/^ *[A-Z]/ { print $0 >> fn; print "sent ", $0, " to ", fn; }
' infile
输出:
skipping 6
assigned fn = step.1
sent C 9.0000000 8.3380808 9.0000001 to step.1
sent C 9.0000000 9.6619194 8.9999999 to step.1
sent H 8.0768455 7.7678700 9.0000001 to step.1
sent H 9.9231545 10.2321301 9.0000001 to step.1
sent H 8.0768455 10.2321301 9.0000001 to step.1
sent H 9.9231545 7.7678700 9.0000001 to step.1
skipping 6
assigned fn = step.2
sent C 9.00000000 8.33808080 9.00000010 to step.2
sent C 9.00000000 9.66191940 8.99999990 to step.2
sent H 8.07684550 7.76787000 9.00000010 to step.2
sent H 9.90912982 10.23213008 8.83969637 to step.2
sent H 8.09087028 10.23213012 9.16030383 to step.2
sent H 9.92315450 7.76787000 9.00000010 to step.2
skipping 6
assigned fn = step.3
sent C 9.00000000 8.33808080 9.00000010 to step.3
sent C 9.00000000 9.66191940 8.99999990 to step.3
sent H 8.07684550 7.76787000 9.00000010 to step.3
sent H 9.86748170 10.23213006 8.68426301 to step.3
sent H 8.13251850 10.23213014 9.31573717 to step.3
sent H 9.92315450 7.76787000 9.00000010 to step.3
结果文件:
$: cat step.1
C 9.0000000 8.3380808 9.0000001
C 9.0000000 9.6619194 8.9999999
H 8.0768455 7.7678700 9.0000001
H 9.9231545 10.2321301 9.0000001
H 8.0768455 10.2321301 9.0000001
H 9.9231545 7.7678700 9.0000001
$: cat step.2
C 9.00000000 8.33808080 9.00000010
C 9.00000000 9.66191940 8.99999990
H 8.07684550 7.76787000 9.00000010
H 9.90912982 10.23213008 8.83969637
H 8.09087028 10.23213012 9.16030383
H 9.92315450 7.76787000 9.00000010
$: cat step.3
C 9.00000000 8.33808080 9.00000010
C 9.00000000 9.66191940 8.99999990
H 8.07684550 7.76787000 9.00000010
H 9.86748170 10.23213006 8.68426301
H 8.13251850 10.23213014 9.31573717
H 9.92315450 7.76787000 9.00000010
请注意,您的示例在第一节中没有前导空格,但在后续节中有一个空格。
根据需要进行调整,希望对您有所帮助。