在Perl脚本中帮助@ array

时间:2014-05-10 10:54:42

标签: arrays perl file

我需要帮助@ perl中的脚本

我有以下文件" etichete":

0.000000 8.700000 speech_L1
8.710000 27.300000 speech_L2 
27.310000 27.600000 speech_L3 
31.210000 37.210000 speech_L4 
37.220000 61.210000 speech_L5

我需要创建" etichete.rttm"文件使用" etichete"像这样:

SPKR-INFO etichete 1 <NA> <NA> <NA> unknown speech_L1 <NA>
SPKR-INFO etichete 1 <NA> <NA> <NA> unknown speech_L2 <NA>
SPKR-INFO etichete 1 <NA> <NA> <NA> unknown speech_L3 <NA>
SPEAKER etichete 1 0.000 8.556 <NA> <NA> speech_L1 <NA>
SPEAKER etichete 1 8.556 21.063 <NA> <NA> speech_L2 <NA>
SPEAKER etichete 1 32.304 9.515 <NA> <NA> speech_L3 <NA>
SPEAKER etichete 1 42.049 0.767 <NA> <NA> speech_L1 <NA>

这是我的代码(我认为我的错误接近于创建矩阵@rttm):

#!/usr/bin/perl -w

use List::MoreUtils qw(uniq);
use File::Path qw(make_path);
use File::Copy "cp";
use warnings;
use autodie;  


open my $fh, "etichete" or die $!;

$nume="etichete"; 
my @file_array;
while (my $line = <$fh>) {
    chomp $line;
    my @line_array = split(/\s+/, $line);
    push (@file_array, \@line_array);
}

my @arr=@file_array;
my $arrSize = @arr;


@speakers=$arr[0][2];
$j=0;
while ($j < $arrSize)
{   
    push(@speakers, $arr[$j][2]);
    $j++;   
}

my @uniq;
foreach my $x (@speakers){
        push @uniq, $x if !grep{/^$x$/}@uniq;
}

my $s1= @uniq;
my @rttm=();

$contorlinie1=0;
while ($contorlinie1 < $s1){
     $rttm[$contorlinie1][0]="SPKR-INFO";
     $rttm[$contorlinie1][1]="$nume";
     $rttm[$contorlinie1][2]="1";   
     $rttm[$contorlinie1][3]="<NA>";
     $rttm[$contorlinie1][4]="<NA>";
     $rttm[$contorlinie1][5]="<NA>";
     $rttm[$contorlinie1][6]="unknown";
     $rttm[$contorlinie1][7]="$uniq[$contorlinie1]";
     $rttm[$contorlinie1][8]="<NA>";
    $contorlinie1++;
    }
$contorlinie2=$s1;
while ($contorlinie2 < $arrSize)
     {
        $rttm[$contorlinie2][0]="SPEAKER";
     $rttm[$contorlinie2][1]="$nume";
     $rttm[$contorlinie2][2]="1";   
     $rttm[$contorlinie2][3]="$arr[$contorlinie2][0]";
     $rttm[$contorlinie2][4]="$arr[$contorlinie2][1]";
     $rttm[$contorlinie2][5]="<NA>";
     $rttm[$contorlinie2][6]="<NA>";
     $rttm[$contorlinie2][7]="$arr[$contorlinie2][2]";
     $rttm[$contorlinie2][8]="<NA>";
    $contorlinie2++;
}


open my $fh1,">etichete.rttm" or die $!;
foreach(@rttm)
    {
    print $fh1 "$-\n";
    }
close $fh1; 

当我运行脚本时,它会在每一行创建一个填充零的文件,当我放入打印矩阵时,就像这样:

ARRAY(0x10b13d8)
ARRAY(0x10b14e0)
ARRAY(0x10b15e8)
ARRAY(0x1038f78)
ARRAY(0x1039080)

2 个答案:

答案 0 :(得分:1)

您的示例代码不会生成您显示的输出。您可能已将$_更改为$-

原因是@rttm是一个数组数组。您无法直接打印数组引用以获取内部数组,您必须首先取消引用它:

print $fh1 "@$_\n";

顺便说一句,当你use autodie时,or die之后无需添加open

答案 1 :(得分:-1)

此替代计划可能会对您有所帮助。

据我所知,您需要的是每个标有SPKR-INFO的独特发言人的输出记录,然后是标有SPEAKER的原始行的重新格式化版本。

您显示的输入数据似乎与您所需的输出不符。我的程序使用此输入

 0.000  8.556 speech_L1
 8.556 21.063 speech_L2
32.304  9.515 speech_L3
42.049  0.767 speech_L1

最大的变化是我放弃了@rttm数组,就像你面对它一样,你可以在进入时将每行打印到输出文件。

我还删除了遍历数组索引的笨拙while循环。因为除了访问数组元素之外不需要索引的值,只需直接对数组值进行交互就更简单,更清晰。

另请注意,如果您有autodie,则无需使用open测试or die...次来电的成功。

由于您已添加List::MoreUtils模块,因此我使用了uniq函数,而不是使用@uniq数组对其进行编码

use strict;
use warnings;
use autodie;

use List::MoreUtils qw(uniq);

open my $fh, '<', 'etichete';

my $nume = 'etichete';

my @file;
while (<$fh>) {
  push @file, [ split ];
}

my @unique_speakers = sort { $a cmp $b } uniq map $_->[2], @file;

open my $out, '>', 'etichete.rttm';

for my $speaker (@unique_speakers) {
  print $out join(' ', 'SPKR-INFO', $nume, '1', '<NA>', '<NA>', '<NA>', 'unknown', $speaker, '<NA>'), "\n";
}

for my $line (@file) {
  print $out join(' ', 'SPEAKER', $nume, '1', $line->[0], $line->[1], '<NA>', '<NA>', $line->[2], '<NA>'), "\n";
}

close $out;

<强>输出

SPKR-INFO etichete 1 <NA> <NA> <NA> unknown speech_L1 <NA>
SPKR-INFO etichete 1 <NA> <NA> <NA> unknown speech_L2 <NA>
SPKR-INFO etichete 1 <NA> <NA> <NA> unknown speech_L3 <NA>
SPEAKER etichete 1 0.000 8.556 <NA> <NA> speech_L1 <NA>
SPEAKER etichete 1 8.556 21.063 <NA> <NA> speech_L2 <NA>
SPEAKER etichete 1 32.304 9.515 <NA> <NA> speech_L3 <NA>
SPEAKER etichete 1 42.049 0.767 <NA> <NA> speech_L1 <NA>