我已经做了好几天了。所以,我的项目是使用perl处理一些蛋白质数据库文件(* .ent)。我有一个完整的目录,我需要计算原子CD1和OD2之间的距离。如果OD2不存在,那么我计算原子CD1和OE2之间的距离。文件中永远不存在两个原子。这对我来说不是一个问题,因为我可以直接在if和elsif condtional中的每个文件中打印出这些值。
但当我退出if / elsif条件时,每个文件中这些值的内存消失了。同样,我已经做了很多艰苦的工作来打开整个文件循环来打印每个文件的解决方案。
基本上,为了计算距离,我需要计算两点的径向距离:P(x1,y1,z1)和Q(x2,y2,z2),其中点P是x,y, CD1原子的z坐标,点Q是OD2或OE2的x,y,z距离。但我称它们为CD1原子坐标的替代品。
PQ距离= sqrt [(x2 - x1)^ 2 +(y2 - y1)^ 2 +(z2 - z1)^ 2]
为了处理类似800(* .ent)的文件,我使用foreach循环迭代目录中的所有文件,并使用while循环遍历它们。 * .ent文件的几列由可变数量的空格分隔,因此根据这些条件将每一行分成一个数组,并且数组是@fields,所以$ fields [2]是从零开始计数正在处理的.ent文件的第三列,或第三列是原子名称。
从我解析过我的* .ent文件的所有内容中,您可以看到这些,尽管这些文件确实包含更多信息。如果它包含ATOM,我只是将每一行分开。
ATOM 1 CB GLU A 9 53.764 15.456 11.540 1.00 0.00 C
ATOM 2 CG GLU A 9 53.265 15.125 12.928 1.00 0.00 C
ATOM 3 CD GLU A 9 54.264 15.606 13.972 1.00 0.00 C
ATOM 4 OE1 GLU A 9 55.435 15.909 13.653 1.00 0.00 O
ATOM 5 OE2 GLU A 9 53.831 15.724 15.123 1.00 0.00 O1-
ATOM 6 CB TRP A 16 51.989 19.241 13.113 1.00 0.00 C
ATOM 7 CG TRP A 16 53.254 19.906 13.560 1.00 0.00 C
ATOM 8 CD1 TRP A 16 54.518 19.371 13.545 1.00 0.00 C
ATOM 9 CD2 TRP A 16 53.380 21.238 14.053 1.00 0.00 C
...
还有更多的ATOMS,但这就是我所需要的。 -----------
\从右到左读取文件时包含小数的前三列是x-y-z坐标。从零开始计数,第2列是原子名称(这对我来说非常重要,因为它是我计算的基础),第三列是我文件中两个氨基酸的氨基酸缩写。它们实际上并不重要,因为重要的原子CD1,OD2或OE2是唯一重要的原子,它们不重复重复,所以我不能将原子混合起来,我指的是右边每个残留物中的原子。
我计算距离的整个perl脚本看起来像这样。它经历了很多!
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Math::Complex;
# This program will take all *.gjf files in a directory and calcualte the distance of OE2 of Glutamate to CD1 of Trypotphan,
# It will also calculate the distance of OD2 of Aspartate to CD1 of Trypotphan.
# Logic: If residue =~ Asp, then calculate distance of OD2 to CD1
# Logic: If residue =~ GLU, then calculte distance of OE2 to CD1.
# 3D distance = sqrt[(x2 - x1)^2 + (y2 - y1)^2 + (z2 - z1)^2]
# This is the formula for the distance between points (x1, y1, z1), and (x2, y2, z2) in three-dimmensional space, or in a sphere.
# x1, y1, z1 = OD2 atom of Aspartate, or OE2 atom of Glutamate.
# x2, y2, z2 = CD1 atom of Tryptophan.
print "Watch and Learn!\n"; # Because I'm a badass biological programmer solving a complex problem.
my @files = <*.ent>; # Save all the files in current directory as an array.
my $file_array; # This will be used to calculate how many PDB files were in the directory
foreach my $file (@files) # A foreach loop loops through all the PDB files in the directory individually.
{
my ($base,$ext) = split (/\./,$file); # split the files along the period (.) to a base and extension.
print "filename is $file\n";
print "basename = $base \n";
print "extension = $ext \n";
my $OUTFILE_CD1_Distance = join('_', $base, "CD1_dist"); # *_CD1_dist.ent denotes that this is the file of the distance solution to the this file.
$OUTFILE_CD1_Distance = join('.', $OUTFILE_CD1_Distance, "txt"); # The PDB extension has to be added to the file.
open (my $fh, '<:encoding(UTF-8)', $file) # open a file to read from. # This is my ent file.
or die "Could not open file $file $!";
open(my $Coordinates, '>', $OUTFILE_CD1_Distance)
or die "Could not open file $OUTFILE_CD1_Distance $!"; \# open a file to write to. # Coordinates is for PDB coordinates.
my $counter;
$counter = 0;
while (my $line =<$fh>) # while loop goes through all the rows of the original PDB files till it reaches the end.
{
chomp $line;
$counter++;
if ($line =~ m/ATOM/)
{
# print "$line\n";
my @fields = split(/\s+/, $line);
# save the fields 2,3, 6,7,8
my $Atom = $fields[2];
my $Residue = $fields[3];
my $x_coordinate = $fields[6];
my $y_coordinate = $fields[7];
my $z_coordinate = $fields[8];
# Now that everything is parsed, calculate the distance.
# my $delta_x = $x_CD1_coordinate - $x_alt_coordinate;
# my $delta_x_squared = $delta_x * $delta_x;
# my $delta_y = $y_CD1_coordinate - $y_alt_coordinate;
# my $delta_y_squared = $delta_y * $delta_y;
# my $delta_z = $z_CD1_coordinate - $z_alt_coordinate;
# my $delta_z_squared = $delta_z * $delta_z;
# my $sum_of_squares_diff = $delta_x_squared + $delta_y_squared + $delta_z_squared;
# my $CD1_distance = sqrt($sum_of_squares_diff);
if ($fields[2] =~ m /CD1/)
{
my $x_CD1_coordinate = $x_coordinate;
my $y_CD1_coordinate = $y_coordinate;
my $z_CD1_coordinate = $z_coordinate;
\# print "$x_CD1_coordinate";
\# print "$z_CD1_coordinate\n";
}
elsif ($fields[2] =~ m/OD2|OE2/)
{
my $x_alt_coordinate = $x_coordinate;
my $y_alt_coordinate = $y_coordinate;
my $z_alt_coordinate = $z_coordinate;
# print "$x_alt_coordinate\n";
# print "$x_coordinate";
}
print "\$x_CD1_coordinate\n";
# my $x_coordinate = $ echo $line | awk '{print $3}';
# print $x_coordinate;
}
} # ending while loop
} #ending foreach loop
$file_array = scalar(@files);
print "\nI just processed $file_array ent files for you ... Computing the distance from CD1\n";
脚本结束:
我的问题是,虽然我可以在if($ field conditionals)中发出print语句来打印我需要计算行的等式的6个变量,并且它将精确打印这些值。但是当我退出时,如果有条件并尝试打印if条件中的6个变量中的一个,而我不在if条件下则不起作用。
基本上,问题是当我尝试执行我在程序中注释掉的公式时,我在一个房间,而我的程序在另一个房间。但是,如果我在循环中只能看到一个点的x,y,z坐标。
你了解我的问题吗?你能提供一些建议吗?
谢谢!
嘿,你知道 你们对变量范围是正确的。我应该在程序的while循环中声明我的变量更高。我遇到了一位同事,我们修复了该计划!#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Math::Complex;
# This program will take all *.gjf files in a directory and calcualte the distance of OE2 of Glutamate to CD1 of Trypotphan,
# It will also calculate the distance of OD2 of Aspartate to CD1 of Trypotphan.
# Logic: If residue =~ Asp, then calculate distance of OD2 to CD1
# Logic: If residue =~ GLU, then calculte distance of OE2 to CD1.
# 3D distance = sqrt[(x2 - x1)^2 + (y2 - y1)^2 + (z2 - z1)^2]
# This is the formula for the distance between points (x1, y1, z1), and (x2, y2, z2) in three-dimmensional space, or in a sphere.
# x1, y1, z1 = OD2 atom of Aspartate, or OE2 atom of Glutamate.
# x2, y2, z2 = CD1 atom of Tryptophan.
print "Watch and Learn!\n"; # Because I'm a badass biological programmer solving a complex problem.
my @files = <*.ent>; # Save all the files in current directory as an array.
my $file_array; # This will be used to calculate how many PDB files were in the directory
foreach my $file (@files) # A foreach loop loops through all the PDB files in the directory individually.
{
my ($base,$ext) = split (/\./,$file); # split the files along the period (.) to a base and extension.
print "filename is $file\n";
print "basename = $base \n";
print "extension = $ext \n";
my $OUTFILE_CD1_Distance = join('_', $base, "CD1_dist"); # *_CD1_dist.ent denotes that this is the file of the distance solution to the this file.
$OUTFILE_CD1_Distance = join('.', $OUTFILE_CD1_Distance, "txt"); # The PDB extension has to be added to the file.
open (my $fh, '<:encoding(UTF-8)', $file) # open a file to read from. # This is my ent file.
or die "Could not open file $file $!";
open(my $Coordinates, '>', $OUTFILE_CD1_Distance)
or die "Could not open file $OUTFILE_CD1_Distance $!"; \# open a file to write to. # Coordinates is for PDB coordinates.
my $counter;
$counter = 0;
my ($x_CD1_coordinate, $y_CD1_coordinate, $z_CD1_coordinate);
my ($x_alt_coordinate, $y_alt_coordinate, $z_alt_coordinate);
while (my $line =<$fh>) # while loop goes through all the rows of the original PDB files till it reaches the end.
{
chomp $line;
$counter++;
if ($line =~ m/ATOM/)
{
# print "$line\n";
my @fields = split(/\s+/, $line);
# save the fields 2,3, 6,7,8
my $atom = $fields[2];
my $residue = $fields[3];
my $x_ref_CD1_coordinate;
my $x_coordinate = $fields[6];
my $y_coordinate = $fields[7];
my $z_coordinate = $fields[8];
if ($fields[2] =~ m/CD1/)
{
$x_CD1_coordinate = $x_coordinate;
$y_CD1_coordinate = $y_coordinate;
$z_CD1_coordinate = $z_coordinate;
#my $x_ref_CD1_coordinate = \$x_CD1_coordinate;
# print "$x_CD1_coordinate";
# print "$z_CD1_coordinate\n";
}
elsif ($fields[2] =~ m/OD2|OE2/)
{
$x_alt_coordinate = $x_coordinate;
$y_alt_coordinate = $y_coordinate;
$z_alt_coordinate = $z_coordinate;
# print "$x_alt_coordinate\n";
# print "$x_coordinate";
}
# print ref $x_CD1_coordinate
# my $x_coordinate = $ echo $line | awk '{print $3}';
# print $x_coordinate;
}
} # ending while loop
print "This is my x CD1 $x_CD1_coordinate\n";
print "This is my y alt $y_alt_coordinate\n";
# Now that everything is parsed, calculate the distance.
my $delta_x = $x_CD1_coordinate - $x_alt_coordinate;
my $delta_x_squared = $delta_x * $delta_x;
my $delta_y = $y_CD1_coordinate - $y_alt_coordinate;
my $delta_y_squared = $delta_y * $delta_y;
my $delta_z = $z_CD1_coordinate - $z_alt_coordinate;
my $delta_z_squared = $delta_z * $delta_z;
my $sum_of_squares_diff = $delta_x_squared + $delta_y_squared + $delta_z_squared;
my $CD1_distance = sqrt($sum_of_squares_diff);
# Finally print the distance to the screen and also to files:
print "This is the distance $CD1_distance\n";
print $Coordinates "$CD1_distance\n";
} #ending foreach loop
$file_array = scalar(@files);
print "\nI just processed $file_array ent files for you ... Computing the distance from CD1\n";
我很高兴我能够通过一个脚本来做到这一点。我已经阅读了一些Perl脚本来计算残差之间的距离,但函数调用对我来说没有任何意义,所以我很高兴能够编写一个程序来计算两个原子之间的距离!