在perl中逐行拆分数组,然后使用正则表达式匹配值以找到差异

时间:2013-11-13 03:02:44

标签: regex perl

我的文本文件包括:

ID_REF  IDENTIFIER  GSM88918    GSM88914    GSM88919    GSM88915    GSM88917    GSM88913    GSM88916    GSM88912
IG_2146_3437147_3437252_rev_at  /start=3437147 /end=3437252 /direction=+ /description=intergenic region nan nan 43.7    50.1    nan nan nan 26.5
IG_415_642550_642629_fwd_at /start=642550 /end=642629 /direction=+ /description=intergenic region   2212.9  1795.1  1112.6  942.6   614.2   753.4   402.6   535.2
.
.
more of this lines

我的脚本会读取数据,计算生物膜(GSM88912,GSM88913,GSM88914和GSM88915)与悬浮(GSM88916,GSM88917,GSM88918和GSM88919)测量值之间的差异。

我打算把它放在一个带有基因名称密钥的哈希值中,即IG_2146_3437147_3437252_rev_at。然后得到4个结果差异,即散列中的GSM88916 - GSM88912 = diff1作为其值。但是我在执行正则表达式时只得到第一个值。

 open(IN,"GDS2768.txt")||die $!;
 my @arrayOfLines = <IN>;
 #print @arrayOfLines;
 close(IN);

 # initialize variables
 my $line;
 my %hashGeneName;
 my $geneName;
 my @geneNames;
 my $GSM88918;
 my $GSM88914;
 my $GSM88919;
 my $GSM88915;
 my $GSM88917;
 my $GSM88913;
 my $GSM88916;
 my $GSM88912;

 foreach $line (@arrayOfLines){
chomp $line;
#if ($line =~ /IG(\w+)\s.+?region\s(\w+|\d+\.?\d*)\s(\w+|\d+\.?\d*)\s(\w+|\d+\.?     \d*)\s(\w+|\d+\.?\d*)\s(\w+|\d+\.?\d*)\s(\w+|\d+\.?\d*)\s(\w+|\d+\.?\d*)\s(\w+|\d+\.?\d*)\s/){
$geneName = $1;
$GSM88918 = $2;
$GSM88914 = $3;
$GSM88919 = $4;
$GSM88915 = $5;
$GSM88917 = $6;
$GSM88913 = $7;
$GSM88916 = $8;
$GSM88912 = $9;
print "$geneName : $GSM88918, $GSM88914, $GSM88919, $GSM88915, $GSM88917, $GSM88913, $GSM88916, $GSM88912\n";
}

}

   OUTPUTS:
   IG_2146_3437147_3437252_rev_at : nan, nan, 43.7, 50.1, nan, nan, nan, 26.5

我希望它打印数组中匹配的行中的所有值。请帮忙。

1 个答案:

答案 0 :(得分:0)

只考虑split在空格上的每一行:

use strict;
use warnings;

while (<>) {
    next if $. == 1;
    my ( $geneName, @vals ) = (split)[ 0, -8 .. -1 ];
    print "$geneName: @vals\n";
}

用法:perl script.pl inFile [>outFile]

最后一个可选参数将输出定向到文件。

数据集输出:

IG_2146_3437147_3437252_rev_at: nan nan 43.7 50.1 nan nan nan 26.5
IG_415_642550_642629_fwd_at: 2212.9 1795.1 1112.6 942.6 614.2 753.4 402.6 535.2

数组@vals的元素是计算差异所需的值。

希望这有帮助!