Question

我有以下.gtf文件，我只需要提取4个变量（染色体，起始/终止密码子和transcripst i.d。

1       Cufflinks       transcript      11869   14412   1000    +       .      gene_id "CUFF.1"; transcript_id "CUFF.1.2"; FPKM "0.3750000000"; frac "0.000000"; conf_lo "0.375000"; conf_hi "0.375000"; cov "1.470346"; full_read_support "yes";
1       Cufflinks       transcript      11869   14412   444     +       .      gene_id "CUFF.1"; transcript_id "CUFF.1.3"; FPKM "0.1666666667"; frac "0.000000"; conf_lo "0.166667"; conf_hi "0.166667"; cov "0.653487"; full_read_support "yes";
2       Cufflinks       transcript      11869   14412   333     +       .      gene_id "CUFF.1"; transcript_id "CUFF.1.4"; FPKM "0.1250000000"; frac "0.000000"; conf_lo "0.125000"; conf_hi "0.125000"; cov "0.490115"; full_read_support "yes";**

我的问题是脚本如何知道如何处理选定的文件？

你用过：

（1）my $file = 'transcripts_selected.gtf'

（2）此脚本也可用于提取所选数据：

say $data->{"chromosome_number"}->{"start_codon"}->{"stop_codon"}->{"transcript_id"};

或应该：

BioSeq->new(-chromosome_number, -start_codon...)方法？

（3）最后这个脚本取自BioperlHOWTO：

my $seq_in = Bio::SeqIO->new( -file   => "<$infile", -format => $infileformat,);
my $seq_out = Bio::SeqIO->new( -file   => ">$outfile", -format => $outfileformat,);
while (my $inseq = $seq_in->next_seq) {$seq_out->write_seq($inseq);

在哪里说变量$ infile / $ outfile应该将.gtf文件的名称放在这里，并且带有所选数据的新文件的名称替换$ outfile？

Answer 1

指定文件名的最简单方法是编写类似：

的内容

my $infile = shift;
my $outfile = shift;

在，然后输入：

perl ScriptName transcripts_selected.gtf OutFileName

在命令行

使用PERL将信息从.gtf提取到新文本文件

1 个答案: