调用数组

时间:2015-10-09 01:58:07

标签: arrays perl bioinformatics

好的,所以我有一堆文件名拥有以下两种格式之一:

Sample-ID_Adapter-Sequence_L001_R1_001.fastq(As Forward)

Sample-ID_Adapter-Sequence_L001_R2_001.fastq(反向)

正向和反向格式之间的唯一区别是文件名中的R1和R2元素。现在,我设法让用户使用以下脚本提供包含这些文件的目录:

#!/usr/bin/perl
use strict;
use warnings;

#Print Directory

print "Please provide the directory containing the FASTQ files from your Illumina MiSeq run \n";
my $FASTQ = <STDIN>;
chomp ($FASTQ);

#Open Directory

my $dir = $FASTQ;
opendir(DIR, $dir) or die "Cannot open $dir: $!";
my @forwardreads = grep { /R1_001.fastq/ } readdir DIR;
closedir DIR;

my $direct = $FASTQ;
opendir(DIR, $direct) or die "Cannot open $dir: $!";
my @reversereads = grep { /R2_001.fastq/ } readdir DIR;
closedir DIR;

foreach my $ffile (@forwardreads) {
    my $forward = $ffile;
    print $forward;
    }

foreach my $rfile (@reversereads) {
    my $reverse = $rfile;
    print $reverse;
    }

问题

我想用上面的脚本做的是找到一种方法来配对从同一个Sample ID派生的两个数组的元素。就像我说的那样,正向和反向文件(来自相同的样本ID)之间的唯一区别是文件名的R1和R2部分。

我已经尝试过寻找从数组中提取元素的方法,但我想让程序代替我来进行匹配。

感谢阅读,希望你们能帮忙!

1 个答案:

答案 0 :(得分:-1)

您必须解析文件名。幸运的是,这非常简单。剥离扩展程序后,您可以_ # Strip the file extension. my($suffix) = $filename =~ s{\.(.*?)$}{}; # Parse Sample-ID_Adapter-Sequence_L001_R1_001 my($sample_id, $adapter_sequence, $uhh, $format, $yeah) = split /_/, $filename; 上的split部分。

sub parse_fastq_filename {
    # Read the next (in this case first and only) argument.
    my $filename = shift;

    # Strip the suffix
    my($suffix) = $filename =~ s{\.(.*?)$}{};

    # Parse Sample-ID_Adapter-Sequence_L001_R1_001
    my($sample_id, $adapter_sequence, $uhh, $format, $yeah) = split /_/, $filename;

    return {
        filename            => $filename,
        sample_id           => $sample_id,
        adapter_sequence    => $adapter_sequence,
        uhh                 => $uhh,
        format              => $format,
        yeah                => $yeah
    };
}

现在你可以用它们做你喜欢的事了。

我建议一些改进代码的方法。首先,将该文件名解析放入一个函数中,以便可以重用它并使主代码更简单。其次,将文件名解析为哈希而不是一堆标量,它会更容易使用和传递。最后,在该哈希中包含文件名本身,然后哈希包含完整数据。这是顺便说一句,是OO编程的门户药物。

glob

然后,不是分别找到左右格式的文件,而是在一个循环中处理所有内容。将匹配的左右对放在哈希中。使用.fastq仅获取# This is where the pairs of files will be stored. my %pairs; # List just the *.fastq files while( my $filename = glob("$FASTQ_DIR/*.fastq")) { # Parse the filename into a hash reference my $fastq = parse_fastq_filename($filename); # Put each parsed fastq filename into its pair $pairs{ $fastq->{sample_id} }{ $fastq->{format} } = $fastq; } 个文件。

%pairs

然后,您可以使用# Iterate through each sample and pair. # $sample is a hash ref of format pairs for my $sample (values %pairs) { # Now iterate through each pair in the sample for my $fastq (values %$sample) { say "$fastq->{sample_id} has format $fastq->{format}"; } } 执行您喜欢的操作。这是打印每个样本ID及其格式的示例。

eval