我有一个文件,在前几行写了一些,然后是一些表格输出。 我想削减第一行,然后跳到表格输出,但我遇到了一些麻烦(即使听起来很简单)。 我的策略是找到标题
示例输入文件:
Query [VOG0001]|NC_002014-NP_040572.1| 1296..1562 + 88 aa|G V protein
Match_columns 100
No_of_seqs 7 out of 16
Neff 2.6
No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM
1 d1gvpa_ b.40.4.7 (A:) Gene V p 100.0 1.6E-38 1.4E-43 221.5 0.0 87 2-89 1-87 (87)
2 d1gvpa_ b.40.4.7 (A:) Gene V p 100.0 1.6E-38 1.4E-43 221.5 0.0 87 2-89 1-87 (87)
3 d1gvpa_ b.40.4.7 (A:) Gene V p 100.0 1.6E-38 1.4E-43 221.5 0.0 87 2-89 1-87 (87)
尝试解析脚本:
open (IN, $hhr_report) or die "cannot open $hhr_report\n";
while (my $line=<IN>){
if ($line =~/^Query/){
my @query=split(/\|/,$line);
my $vogL=$query[0];
my @vogL2=split(/\s+/,$vogL);
$vog=$vogL2[1];
$vog=~ s/\[//g;
$vog=~ s/\]//g;
print "query_array:\t@query\n";
print "query_vog:\t$vog\n";
}
next until ($line =~/Query HMM/);
#next if ($line =~/Query HMM/);
#next until ($line =~/^No\s[0-9]+/);
print "$line\n";
my @columns = split(/\s+/,$line);
... }
我不确定我是否遗漏了一些简单的东西。 但是现在我似乎只是在解析标题行(包含在查询HMM中),但是我想在之后解析这些行。
任何帮助表示感谢。
答案 0 :(得分:1)
我会尝试丢弃标题行的所有内容(或解析第一行), 然后开始解析标题之后的行,如下所示:
#!/usr/bin/env perl
use strict;
use warnings;
open (my $fh, "<", $hhr_report) or die "Cannot open $hhr_report: $!";
my $header;
do {
$header = <$fh>;
# If you need to parse lines before the header for some reason,
# do that here
}while( !is_header($header) );
# If you like, parse the header column to get the column names
my @lines;
while ( my $line = <$fh> ){
my @columns = split_line($line);
push @lines, \@columns;
}
sub is_header {
my $line = shift;
return $line =~ /^No\sHit/ ? 1 : 0;
}
sub split_line {
my $line = shift;
# Here, use a regex to split the columns, depending on what you need.
# You could also consider outputting errors if the line is malformatted or missing important values
}
答案 1 :(得分:0)
我认为你想要完成的事情可以更简单地完成。我理解你想:
如果是这样,你可以这样做:
open (IN, $hhr_report) or die "cannot open $hhr_report\n";
# Get the first line of the file and process it:
my $first_line = <$fh>;
my @query=split(/\|/,$first_line);
my $vogL=$query[0];
my @vogL2=split(/\s+/,$vogL);
my $vog=$vogL2[1];
$vog=~ s/\[//g; #/
$vog=~ s/\]//g; #/
print "query_array:\t@query\n";
print "query_vog:\t$vog\n";
# Work on the rest of the file:
my $in_table = 0;
while (my $line=<IN>){
if ($in_table) {
# process your columns here
print "$line\n";
my @columns = split(/\s+/,$line);
... # the rest of your processing
}
# read (and throw away) lines until you match the table header:
$in_table = 1 if $line =~/Query HMM/;
# next time through the while loop you'll have your
# first tabular data and the $in_table will be true
}