使用perl在数据结构中存储表

时间:2017-10-31 13:24:49

标签: regex perl multidimensional-array hash tabular

我正在考虑如何将下表存储到复杂的数据结构中,以及使用哪种数据结构。输入是一个以制表符分隔的文本文件,源自Excel。请注意,某些单元格为空(在本例中为“RQ Max”)。这是表格:

Well    Sample Name Target Name RQ Max  Ct Mean
1   Sample 1    actin       20,514
2   Sample 1    claudin     30,544
3   Sample 1    occludin        31,183
25  Sample 1    actin       20,514
26  Sample 1    claudin     30,544
27  Sample 1    occludin        31,183
49  Sample 2    actin       20,416
50  Sample 2    claudin     25,611
51  Sample 2    occludin        27,831
73  Sample 2    actin       20,416
74  Sample 2    claudin     25,611
75  Sample 2    occludin        27,831
97  Sample 3    actin       24,213
98  Sample 3    claudin     32,065
99  Sample 3    occludin        34,556
194 H2O claudin     
195 H2O occludin        
217 H2O actin       
218 H2O claudin     
219 H2O occludin 

这是我的代码:

#! usr/bin/perl
use strict;
use warnings;


# CHECK FOR CORRECT USAGE
unless (@ARGV == 1){
    die "Usage: perl $0 \"file.txt\"\n";
}

my $input = "$ARGV[0]";
#chomp ($input);

open (READ, $input) || die "Cannot open $input: $!\n";

my $line = '';
my %data;
while ($line = <READ>){
    chomp $line;
    if ($line =~ m/^[0-9]/i);
        $i++;
        $data{"$i"} = [ split /\t{1}/, $line ];
    }
}

正如您所看到的,我正处于程序的最开始,因为我不确定要使用哪种结构。实际上我只需要整个表的三列,即“Sample Name”,“Target Name”和“Ct Mean”。后来我想为每个Sample计算一些东西,将这些作为键可能会有所帮助。在哈希结构的散列中,我想将目标名称作为“第二个键”。有人能把我推向正确的方向吗?我目前正在努力存储数据,因为我没有使用perl更长的时间......

这是我最终想要的:

%data = (
            Sample 1 => {
                actin       => 20.514,
                claudin     => 30.544,
                occludin    => 31.183,
            },
            Sample 2 => {
                    actin       => 20.416,
                    claudin     => 25.611,
                    occludin    => 27.831,
                },
                ...
);

1 个答案:

答案 0 :(得分:1)

所以有几点 - 如果你从命令行指定的文件中读取 - 一个简单的简写是:

while ( <> ) {

其中perl 读取命令行中指定的STDIN 文件。究竟你是如何得到sed / grep。

第二步 - 您可以使用哈希切片来解析以制表符分隔的日期。

假设您正在考虑仅提取CT_Mean:

#!/usr/bin/env perl

use strict;
use warnings;

use Data::Dumper;

my %results; 

#read header row
chomp ( my @header = split /\t/, <> ); 
#tidy up leading whitespace in the fields (there's some in your example data)
s/^\s+// for @header;
#iterate the rest of STDIN or files on command line. 
while ( <> ) {
   #remove trailing linefeed. 
   chomp;
   #tidy up leading whitespace again. 
   s/^\s+//g;

   my %row;
   #use hash slice to read key-value. 
   @row{@header} = split /\t/;
   #print for debug
   print Dumper \%row;

   #skip the H2O lines. 
   next if $row{'Sample Name'} eq 'H2O';

   #Cosmetic assignments - could rewrite to a single one
   my $sample_name = $row{'Sample Name'};
   my $ct_mean = $row{'Ct Mean'};
   my $target_name = $row{'Target Name'};

   $results{$sample_name}{$target_name} = $ct_mean; 
}

print Dumper \%results;

给你:

$VAR1 = {
          'Sample 2' => {
                          'occludin' => '27,831',
                          'actin' => '20,416',
                          'claudin' => '25,611'
                        },
          'Sample 3' => {
                          'occludin' => '34,556',
                          'actin' => '24,213',
                          'claudin' => '32,065'
                        },
          'Sample 1' => {
                          'claudin' => '30,544',
                          'occludin' => '31,183',
                          'actin' => '20,514'
                        }
        };

(注意 - 哈希明确无序)