我有一个像下面这样的大文件,我想把这个文件分成多个文件。在ENDMDL之后,每个文件都应该中断。对于以下文件,将有三个名为pose1.av,pose2.av和pose3.av。
的输出文件MODEL 1
SML 170 O PRO A 17 16.893 3.030 0.799 1.00 1.00 O
SML 171 OXT PRO A 17 18.167 2.722 2.597 1.00 1.00 O
TER 172 PRO A 17
ENDMDL
MODEL 2
SML 4 CG ARG A 1 -2.171 -7.105 -4.278 1.00 1.00 C
SML 5 CD ARG A 1 -1.851 -8.581 -4.022 1.00 1.00 C
SML 113 HD1 HIS A 12 2.465 -8.206 5.062 1.00 1.00 H
TER 114 HIS A 12
ENDMDL
MODEL 3
SML 101 N HIS A 12 3.765 -3.995 7.233 1.00 1.00 N
SML 102 CA HIS A 12 2.584 -4.736 6.934 1.00 1.00 C
TER 103 HIS A 12
ENDMDL
答案 0 :(得分:4)
使用bash和sed非常有效:
n=0
while IFS= read -r firstline; do
{ echo "$firstline"; sed '/^ENDMDL$/q'; } > "pose$((++n)).av"
done < file
它比其他Bash答案更有效:输出文件只打开一次,大部分解析都是由sed完成的,而不是由bash完成。
答案 1 :(得分:4)
csplit可以开箱即用
csplit -z -s -f pose -b "%01d.av" file '/^ENDMDL$/+1' '{*}'
答案 2 :(得分:3)
Awk是完成这项任务的不错选择:
awk '{file="pose"++i;printf "%s%s",$0,RS > file;close(file)}' RS='ENDMDL\n' file
答案 3 :(得分:3)
使用perl one-liner
perl -ne '$fh or open $fh, "> pose".++$i".av"; print $fh $_; undef $fh if /^ENDMDL/' file.txt
答案 4 :(得分:2)
纯粹的Bash:
cnt=1
while read line; do
echo "$line" >> pose${cnt}.av
[ "$line" == "ENDMDL" ] && let cnt+=1
done < filename.txt
答案 5 :(得分:2)
awk '/^MODEL/{out="pose"++cnt".av"} {print > out}' file