我需要在编辑我的awk脚本时请求帮助。这是原始版本:
BEGIN { printf ("CRYST1 200.000 200.000 200.000 90.00 90.00 90.00 P 1 1\n")
maxatoms=1000
natom=0
found_struct = 0
found_bond = 0
}
{
if( NF == 5 )
{
foundff=0
natom++
fftype[natom]="UNKNOWN"
if ($1 ~ /CT/)
{
fftype[natom] = "C"
foundff=1
}
else if ($1 ~ /OH/)
{
fftype[natom] = "O"
foundff=1
}
else if ($1 ~ /HC/)
{
fftype[natom] = "H"
foundff=1
}
else if ($1 ~ /N/)
{
fftype[natom] = "N"
foundff=1
}
else if ($1 ~ /H1/)
{
fftype[natom] = "H"
foundff=1
}
else if ($1 ~ /HO/)
{
fftype[natom] = "H"
foundff=1
}
else if ($1 = "C")
{
fftype[natom] = "C"
foundff=1
}
else if ($1 = "O")
{
fftype[natom] = "O"
foundff=1
}
next
x[natom] = $1
y[natom] = $2
z[natom] = $3
if (foundff == 0)
printf("PROBLEM : Atom ff type %s not known\n", $6)
}
}
END {
for (iatom=1; iatom <= natom; iatom++)
{
printf("HETATM %d %2s %d %14.9f %14.9f %14.9f\n" ,
iatom, fftype[iatom], iatom, x[iatom], y[iatom], z[iatom])
}
printf ("END\n")
}
这是我正在使用的文件类型。
0 3 186 200 75202
timestep 500 186 0 3 0.002000 1.000000
40.0000000000 0.0000000000 0.0000000000
-0.0000000034 40.0000000000 0.0000000000
-0.0000000034 -0.0000000034 40.0000000000
CT_1 1 12.011000 0.061000 1.087513
-1.961325738 1.828501682 -8.933652557
CT_1 2 12.011000 0.061000 0.789711
-3.851025437 3.495427316 -10.05849230
CT_1 3 12.011000 0.061000 0.581330
-5.804493575 4.589489777 -8.369482861
ECT
我想把它作为输出:
CRYST1 200.000 200.000 200.000 90.00 90.00 90.00 P 1 1
HETATM 1 C 1 -1.961325738 1.828501682 -8.933652557
HETATM 2 C 2 -3.851025437 3.495427316 -10.05849230
HETATM 3 C 3 -5.804493575 4.589489777 -8.369482861
ECT
但是坐标并没有真正好起来(CT_1 1 12.011000 0.061000 1.087513之后的下一行)。你能看看并建议任何解决方案吗?
答案 0 :(得分:1)
不太清楚你想如何处理“原子”,但如果找到getline
,我可能会建议使用CT_1
命令获取下一行。因此,如果找到一条线,您可以立即处理。从描述中不清楚第一个字段是否包含_
和后面的数字。我假设其中有一个_
。
这样的事情:
awk 'BEGIN { print "CRYST1 200.000 200.000 200.000 90.00 90.00 90.00 P 1 1" }
NR < 6 {next}
/^(CT|OH|HC|N|H1|HO|C|O)_/{a=$1;getline;++n;print "HETATM",n,substr(a,1,1),n,$1,$2,$3;next}
{ print "Bad line! ("$0")" }
' <<EOT
0 3 186 200 75202
timestep 500 186 0 3 0.002000 1.000000
40.0000000000 0.0000000000 0.0000000000
-0.0000000034 40.0000000000 0.0000000000
-0.0000000034 -0.0000000034 40.0000000000
CT_1 1 12.011000 0.061000 1.087513
-1.961325738 1.828501682 -8.933652557
CT_1 2 12.011000 0.061000 0.789711
-3.851025437 3.495427316 -10.05849230
CT_1 3 12.011000 0.061000 0.581330
-5.804493575 4.589489777 -8.369482861
OH_1 3 12.011000 0.061000 0.581330
-5.804493575 4.589489777 -8.369482861
HC_1 3 12.011000 0.061000 0.581330
-5.804493575 4.589489777 -8.369482861
QW_1 3 12.011000 0.061000 0.581330
-5.804493575 4.589489777 -8.369482861
EOT
输出:
CRYST1 200.000 200.000 200.000 90.00 90.00 90.00 P 1 1
HETATM 1 C 1 -1.961325738 1.828501682 -8.933652557
HETATM 2 C 2 -3.851025437 3.495427316 -10.05849230
HETATM 3 C 3 -5.804493575 4.589489777 -8.369482861
HETATM 4 O 3 -5.804493575 4.589489777 -8.369482861
HETATM 5 H 3 -5.804493575 4.589489777 -8.369482861
Bad line! (QW_1 3 12.011000 0.061000 0.581330)
Bad line! (-5.804493575 4.589489777 -8.369482861)
答案 1 :(得分:1)
我不会和getline
一起试试这个:
awk '/^(H[1C0]|N|C|O)/{printf "HETATM %d %s %d ",++i,substr($1,1,1),i;p=1;next}p' file
HETATM 1 C 1 -1.961325738 1.828501682 -8.933652557
HETATM 2 C 2 -3.851025437 3.495427316 -10.05849230
HETATM 3 C 3 -5.804493575 4.589489777 -8.369482861
只需添加BEGIN
块即可打印标题,您应该进行排序。
BEGIN { print "CRYST1 200.000 200.000 200.000 90.00 90.00 90.00 P 1 1" }
答案 2 :(得分:0)
perl -ane ' if ($printNow == 1) {printf("HETATM %d %2s %d %14.9f %14.9f %14.9f\n" ,$i,$type,$i,$F[0],$F[1],$F[2]);$printNow =0;}; if (scalar @F == 5 and (/^CT/ or /^OH/ or /^HC/ or /^N/)) {$i++; $printNow =1 ; $type =substr($_,0,1)}' filename
希望这有效+