好的,所以我有一堆文件名拥有以下两种格式之一:
Sample-ID_Adapter-Sequence_L001_R1_001.fastq(As Forward)
Sample-ID_Adapter-Sequence_L001_R2_001.fastq(反向)
正向和反向格式之间的唯一区别是文件名中的R1和R2元素。现在,我设法让用户使用以下脚本提供包含这些文件的目录:
#!/usr/bin/perl
use strict;
use warnings;
#Print Directory
print "Please provide the directory containing the FASTQ files from your Illumina MiSeq run \n";
my $FASTQ = <STDIN>;
chomp ($FASTQ);
#Open Directory
my $dir = $FASTQ;
opendir(DIR, $dir) or die "Cannot open $dir: $!";
my @forwardreads = grep { /R1_001.fastq/ } readdir DIR;
closedir DIR;
my $direct = $FASTQ;
opendir(DIR, $direct) or die "Cannot open $dir: $!";
my @reversereads = grep { /R2_001.fastq/ } readdir DIR;
closedir DIR;
foreach my $ffile (@forwardreads) {
my $forward = $ffile;
print $forward;
}
foreach my $rfile (@reversereads) {
my $reverse = $rfile;
print $reverse;
}
我想用上面的脚本做的是找到一种方法来配对从同一个Sample ID派生的两个数组的元素。就像我说的那样,正向和反向文件(来自相同的样本ID)之间的唯一区别是文件名的R1和R2部分。
我已经尝试过寻找从数组中提取元素的方法,但我想让程序代替我来进行匹配。
感谢阅读,希望你们能帮忙!
答案 0 :(得分:-1)
您必须解析文件名。幸运的是,这非常简单。剥离扩展程序后,您可以_
# Strip the file extension.
my($suffix) = $filename =~ s{\.(.*?)$}{};
# Parse Sample-ID_Adapter-Sequence_L001_R1_001
my($sample_id, $adapter_sequence, $uhh, $format, $yeah) = split /_/, $filename;
上的split部分。
sub parse_fastq_filename {
# Read the next (in this case first and only) argument.
my $filename = shift;
# Strip the suffix
my($suffix) = $filename =~ s{\.(.*?)$}{};
# Parse Sample-ID_Adapter-Sequence_L001_R1_001
my($sample_id, $adapter_sequence, $uhh, $format, $yeah) = split /_/, $filename;
return {
filename => $filename,
sample_id => $sample_id,
adapter_sequence => $adapter_sequence,
uhh => $uhh,
format => $format,
yeah => $yeah
};
}
现在你可以用它们做你喜欢的事了。
我建议一些改进代码的方法。首先,将该文件名解析放入一个函数中,以便可以重用它并使主代码更简单。其次,将文件名解析为哈希而不是一堆标量,它会更容易使用和传递。最后,在该哈希中包含文件名本身,然后哈希包含完整数据。这是顺便说一句,是OO编程的门户药物。
glob
然后,不是分别找到左右格式的文件,而是在一个循环中处理所有内容。将匹配的左右对放在哈希中。使用.fastq
仅获取# This is where the pairs of files will be stored.
my %pairs;
# List just the *.fastq files
while( my $filename = glob("$FASTQ_DIR/*.fastq")) {
# Parse the filename into a hash reference
my $fastq = parse_fastq_filename($filename);
# Put each parsed fastq filename into its pair
$pairs{ $fastq->{sample_id} }{ $fastq->{format} } = $fastq;
}
个文件。
%pairs
然后,您可以使用# Iterate through each sample and pair.
# $sample is a hash ref of format pairs
for my $sample (values %pairs) {
# Now iterate through each pair in the sample
for my $fastq (values %$sample) {
say "$fastq->{sample_id} has format $fastq->{format}";
}
}
执行您喜欢的操作。这是打印每个样本ID及其格式的示例。
eval