我对Perl很新,所以如果这有些不成熟,我很抱歉。
我正在使用Perl脚本作为Python,文本格式等的包装器,我正在努力获得我想要的输出。
该脚本采用一个文件夹,对于此示例,该文件夹包含6个文本文件(test1.txt到test6.txt)。然后,该脚本从文件中提取一些信息,运行一系列命令行程序,然后输出制表符分隔的结果。但是,该结果仅包含通过脚本处理的其余部分(即结果)的结果。
以下是我目前所拥有的一些片段:
use strict;
use warnings;
## create array to capture all of the file names from the folder
opendir(DIR, $folder) or die "couldn't open $folder: $!\n";
my @filenames = grep { /\.txt$/ } readdir DIR;
closedir DIR;
#here I run some subroutines, the last one looks like this
my $results = `blastn -query $shortname.fasta -db DB/$db -outfmt "6 qseqid sseqid score evalue" -max_target_seqs 1`;
#now I would like to compare what is in the @filenames array with $results
制表符分隔结果示例 - 存储在$results
:
test1.txt 200 1:1-20 79 80
test3.txt 800 1:1-200 900 80
test5.txt 900 1:1-700 100 2000
test6.txt 600 1:1-1000 200 70
我希望最终输出包含通过脚本运行的所有文件,所以我想我需要一种比较两个数组或者将数组与哈希进行比较的方法?
所需输出的示例:
test1.txt 200 1:1-20 79 80
test2.txt 0 No result
test3.txt 800 1:1-200 900 80
test4.txt 0 No result
test5.txt 900 1:1-700 100 2000
test6.txt 600 1:1-1000 200 70
更新
好的,所以我通过将文件读入哈希然后进行比较,得到了@terdon的建议。所以我试图找出如何通过写入文件和重新读取文件来做到这一点 - 我似乎仍然无法使语法正确。这就是我所拥有的,但似乎我无法将数组与哈希匹配 - 这意味着哈希必须是正确的:
#!/usr/bin/env perl
use strict;
use warnings;
#create variable to mimic blast results
my $blast_results = "file1.ab1 9 350 0.0 449 418 418 403479 403042 567
file3.ab1 2 833 0.0 895 877 877 3717226 3718105 984";
#create array to mimic filename array
my @filenames = ("file1.ab1", "file2.ab1", "file3.ab1");
#header for file
my $header = "Query\tSeq_length\tTarget found\tScore (Bits)\tExpect(E-value)\tAlign-length\tIdentities\tPositives\tChr\tStart\tEnd\n";
#initialize hash
my %hash;
#split blast results into array
my @row = split(/\s+/, $blast_results);
$hash{$row[0]}=$_;
print $header;
foreach my $file (@filenames){
## If this filename has an associated entry in the hash, print it
if(defined($hash{$file})){
print "$row[0]\t$row[9]\t$row[1]:$row[7]-$row[8]\t$row[2]\t$row[3]\t$row[4]\t$row[5]\t$row[6]\t$row[1]\t$row[7]\t$row[8]\n";
}
## If not, print this.
else{
print "$file\t0\tNo Blast Results: Sequencing Rxn Failed\n";
}
}
print "-----------------------------------\n";
print "$blast_results\n"; #test what results look like
print "-----------------------------------\n";
print "$row[0]\t$row[1]\n"; #test if array is getting split correctly
print "-----------------------------------\n";
print "$filenames[2]\n"; #test if other array present
此脚本的结果是(@filenames
数组与哈希值不匹配):
Query Seq_length Target found Score (Bits) Expect(E-value) Align-length Identities Positives Chr Start End
file1.ab1 0 No Blast Results: Sequencing Rxn Failed
file2.ab1 0 No Blast Results: Sequencing Rxn Failed
file3.ab1 0 No Blast Results: Sequencing Rxn Failed
-----------------------------------
file1.ab1 9 350 0.0 449 418 418 403479 403042 567
file3.ab1 2 833 0.0 895 877 877 3717226 3718105 984
-----------------------------------
file1.ab1 9
-----------------------------------
file3.ab1
答案 0 :(得分:-1)
我不完全确定你需要什么,但是相当于awk的A[$1]=$0
是使用Perl中的哈希来完成的。类似的东西:
my %hash;
## Open the output file
open(my $fh, "<","text_file");
while(<$fh>){
## remove newlines
chomp;
## split the line
my @A=split(/\s+/);
## Save this in a hash whose keys are the 1st fields and whose
## values are the associated lines.
$hash{$A[0]}=$_;
}
close($fh);
## Now, compare the file to @filenames
foreach my $file (@filenames){
## Print the file name
print "$file\t";
## If this filename has an associated entry in the hash, print it
if(defined($hash{$file})){
print "$hash{$file}\n";
}
## If not, print this.
else{
print "0\tNo result\n";
}
}