Perl:组合两个数组,删除相同的标题,保持格式

时间:2016-11-27 21:22:31

标签: arrays perl split mesh

下面你将看到我生成的两个数组的内容。如何组合两个数组,删除重复的相同标题,但保持相同的格式 - 几乎像构建矩阵?我目前正在使用网格将数组合并为一个,但它并不是很有效。我没有遇到任何其他可能有帮助的事情,比如拆分,推送等。我在下面显示了我的代码。

输入文件“phred.txt”

 "#$%&'()

输入文件“bases.txt”

ABCDEFGH

打印阵列1的输出

Sequence_1 
1     2     3     4    5

打印阵列2的输出

Sequence_1 
A     B     C     D    E

组合两个阵列所需的输出

Sequence_1
1     2     3     4     5
A     B     C     D     E

当前使用网格战略的结果

Sequence_1
Sequence_1
1A     2B     3C     4D     5E

当前代码

use warnings;
use strict;

use List::MoreUtils qw(mesh);

open( PHRED, '<', '/path/to/phred.txt' ) or die $!;
open( BASES, '<', '/path/to/bases.txt' ) or die $!;
open( OUT,   '>', '/path/to/out.txt' )   or die $!;

my @symbols;
my @bases;
my $count = 0;
my @finalphred;
my @finalbases;

my %hash = (
    '"'  => "1",
    '#'  => "2",
    '$'  => "3",
    '%'  => "4",
    '&'  => "5",
    q(') => "6",
    '('  => "7",
    ')'  => "8"
);

while ( my $fastq = <PHRED> ) {
    my $substring = substr( $fastq, 0, 5 );
    push( @symbols, $substring );
}

foreach ( @symbols ) {

    my @eachsymbol = split //, $_;
    $count++;
    push( @finalphred, "\n", "Sequence_$count\n" );

    foreach my $symbol ( @eachsymbol ) {
        if ( exists( $hash{$symbol} ) ) {
            push( @finalphred, $hash{$symbol}, "\t" );
        }
    }
}

my $count_again = 0;

while ( my $fastq_again = <BASES> ) {
    my $substring_again = substr( $fastq_again, 0, 5 );
    push( @bases, $substring_again );
}

foreach ( @bases ) {
    my @eachsymbol_again = split //, $_;
    $count_again++;
    push( @finalbases, "\n", "Sequence_$count_again\n" );
    foreach my $symbol_again (@eachsymbol_again){ 
         push (@finalbases, $symbol_again, "\t");
    }
}
foreach (@finalphred){ #diagnostic to test array contents
     print "$_"; 
} 
foreach (@finalbases){ #diagnostic to test array contents
     print "$_"; 
} 
my @last = mesh @finalphred, @finalbases;

print OUT @last;

感谢您帮我完成此代码并获得正确的输出!

3 个答案:

答案 0 :(得分:1)

其中一个主要问题是您从未打印出@eachsymbol_again的任何内容。您将每个四个字符的字符串拆分为四个字符并将其放入该数组中,然后忽略它。它肯定不会产生你说它的输出。

此外,mesh是一个奇怪的选择,可以像你那样组合你的数组

作为参考,你的数组看起来像这样

@finalphred

[
  "\n",
  "Sequence_1\n",
  1,
  "\t",
  2,
  "\t",
  3,
  "\t",
  4,
  "\t",
  "\n",
  "Sequence_2\n",
  5,
  "\t",
  6,
  "\t",
  7,
  "\t",
  8,
  "\t",
)

@finalbases

(
  "\n",
  "Sequence_1\n",
  "\n",
  "Sequence_2\n"
)

在这两个数组中你甚至没有相同数量的元素,所以在它们上面调用mesh没有多大意义


更新

这是一个工作程序

我使用了以下数据

phred.txt

"#$%
&'()

bases.txt

ABCD
EFGH

Perl代码

use strict;
use warnings 'all';
use autodie;

my %xlate = map { chr($_ + 33) => $_ } 1 .. 8;

open my $phred_fh, '<', 'phred.txt';
open my $bases_fh, '<', 'bases.txt';

my $n;

until ( eof $phred_fh or eof $bases_fh ) {

    my @syms = map [ split //, substr <$_>, 0, 4 ], $phred_fh, $bases_fh;

    printf "Sequence_%d\n", ++$n;
    print join("\t", map $xlate{$_}, @{$syms[0]}), "\n";
    print join("\t", @{$syms[1]}), "\n";
    print "\n";
}

输出

Sequence_1
1   2   3   4
A   B   C   D

Sequence_2
5   6   7   8
E   F   G   H

答案 1 :(得分:0)

我认为你根本不需要使用mesh来完成这项工作。将文件读入数组处理它们然后用格式化将它们写入文件更为简单。同时,如果文件大小很大以适合主存储器,那么它也可以进行逐行处理。

#!/usr/bin/perl
use warnings;
use strict;

open( PHRED, '<', 'phred.txt' ) or die $!;
open( BASES, '<', 'bases.txt' ) or die $!;
open( OUT,   '>', 'out.txt' )   or die $!;

my @finalphred;
my @finalbases;

my %hash = (
    '"'  => "1",
    '#'  => "2",
    '$'  => "3",
    '%'  => "4",
    '&'  => "5",
    q(') => "6",
    '('  => "7",
    ')'  => "8"
);

while ( my $fastq = <PHRED> ) {
    chomp $fastq;
    my @items = split //, $fastq;
    my @phreds = map {$hash{$_}} grep {exists $hash{$_}} @items;
    push (@finalphred, \@phreds);
}

while ( my $fastq_again = <BASES> ) {
    chomp $fastq_again;
    my @items = split //, $fastq_again;
    push(@finalbases, \@items);
}

for my $i (0 .. $#finalbases) {
    if(@{$finalbases[$i]} && @{$finalphred[$i]}) {
        print OUT "Sequence_" . ($i + 1),"\n";
        printf OUT "%-6s" x scalar @{$finalphred[$i]},@{$finalphred[$i]};
        print OUT "\n";
        printf OUT "%-6s" x scalar @{$finalbases[$i]},@{$finalbases[$i]};
        print OUT "\n";
    }
    else {
        print "Both arrays doesn't contain equal no of elements\n";
    }
}

答案 2 :(得分:0)

以下是Perl 6中的解决方案:

#!/usr/bin/env perl6

subset File of Str where *.IO.f;

sub MAIN (File :$phred='phred.txt', File :$bases='bases.txt') {
    my $phred-fh = open $phred;
    my $bases-fh = open $bases;
    my %xlate    = map { chr($_ + 33) => $_ }, 1..8;

    for 1..* Z $phred-fh.IO.lines Z $bases-fh.IO.lines -> ($i, $score, $seq) {
        put join "\n",·
            "Sequence_$i",·
            (map { %xlate{$_} }, $score.comb).join("\t"),·
            $seq.comb.join("\t");
    }
}