我有一个制表符分隔的输入文件,格式为:
+ Chr1 www
- Chr2 zzz
...
我想逐行针对以下格式的参考标签分隔文件(以下代码中的TRANSCRIPTS):
Chr1 + xxx UsefulInfo1
Chr2 - yyy UsefulInfo2
...
并希望输出如下所示:
+ Chr1 UsefulInfo1
- Chr2 UsefulInfo2
...
这是我尝试从命令行获取变量名,从输入文件中获取某些信息并从参考文件中添加有用的信息:
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
my $inFile = $ARGV[0];
my $outFile = $ARGV[1];
open(INFILE, "<$inFile") || die("Couldn't open $inFile: $!\n");
open(OUTFILE, ">$outFile") || die("Couldn't create $outFile: $!\n");
open(TRANSCRIPTS, "</path/TranscriptInfo") || die("Couldn't open reference file!");
my @transcripts = split(/\t+/, <TRANSCRIPTS>);
chomp @transcripts;
#Define desired information from input for later
while (my @columns = split(/\t+/, <INFILE>)) {
chomp @columns;
my $strand = $columns[0];
my $chromosome = $columns[1];
#Attempt to search reference file line by line for matching criteria and copying a column of matching lines
foreach my $reference(@transcripts) {
my $refChr = $reference[0]; #Error for this line
my $refStrand = $reference[1]; #Error for this line
if ($refChr eq $chromosome && $refStrand eq $strand) {
my $info = $reference[3]; #Error for this line
print OUTFILE "$strand\t$chromosome\t\$info\n";
}
}
}
close(OUTFILE); close(INFILE);
目前,我收到“全局符号“ @reference”需要明确的软件包名称。”定义此的正确方法是什么?即使正确定义了符号,我也无法完全确定我的foreach循环是否可以按预期运行。
任何一般性建议也将不胜感激。原谅我的无知,长期潜伏,没有我自己能教的正规编码训练。
答案 0 :(得分:1)
已修复:
use strict;
use warnings;
use feature qw( say );
my $in_qfn = $ARGV[0];
my $out_qfn = $ARGV[1];
my $transcripts_qfn = "/path/TranscriptInfo";
my @transcripts;
{
open(my $transcripts_fh, "<", $transcripts_qfn)
or die("Can't open \"$transcripts_qfn\": $!\n");
while (<$transcripts_fh>) {
chomp;
push @transcripts, [ split(/\t/, $_, -1) ];
}
}
{
open(my $in_fh, "<", $in_qfn)
or die("Can't open \"$in_qfn\": $!\n");
open(my $out_fh, ">", $out_qfn)
or die("Can't create \"$out_qfn\": $!\n");
while (<$in_fh>) {
chomp;
my ($strand, $chr) = split(/\t/, $_, -1);
for my $transcript (@transcripts) {
my $ref_chr = $transcript->[0];
my $ref_strand = $transcript->[1];
if ($chr eq $ref_chr && $strand eq $ref_strand) {
my $info = $transcript->[2];
say $out_fh join("\t", $strand, $chr, $info);
}
}
}
}
也就是说,上面的方法效率很低。让我们将$ transcript_qfn中的行数称为N,将我们将$ in_qfn中的行数称为M。内部循环执行等于N * M的次数。实际上,它只需要执行N次。
use strict;
use warnings;
use feature qw( say );
my $in_qfn = $ARGV[0];
my $out_qfn = $ARGV[1];
my $transcripts_qfn = "/path/TranscriptInfo";
my %to_print;
{
open(my $in_fh, "<", $in_qfn)
or die("Can't open \"$in_qfn\": $!\n");
while (<$in_fh>) {
chomp;
my ($strand, $chr) = split(/\t/, $_, -1);
++$to_print{$strand}{$chr};
}
}
{
open(my $transcript_fh, "<", $transcript_qfn)
or die("Can't open \"$transcript_qfn\": $!\n");
open(my $out_fh, ">", $out_qfn)
or die("Can't create \"$out_qfn\": $!\n");
while (<$transcript_fh>) {
chomp;
my ($ref_chr, $ref_strand, $info) = split(/\t/, $_, -1);
next if !$to_print{$ref_strand};
next if !$to_print{$ref_strand}{$ref_chr};
say $out_fh join("\t", $ref_strand, $ref_chr, $info);
}
}