我正在考虑如何将下表存储到复杂的数据结构中,以及使用哪种数据结构。输入是一个以制表符分隔的文本文件,源自Excel。请注意,某些单元格为空(在本例中为“RQ Max”)。这是表格:
Well Sample Name Target Name RQ Max Ct Mean
1 Sample 1 actin 20,514
2 Sample 1 claudin 30,544
3 Sample 1 occludin 31,183
25 Sample 1 actin 20,514
26 Sample 1 claudin 30,544
27 Sample 1 occludin 31,183
49 Sample 2 actin 20,416
50 Sample 2 claudin 25,611
51 Sample 2 occludin 27,831
73 Sample 2 actin 20,416
74 Sample 2 claudin 25,611
75 Sample 2 occludin 27,831
97 Sample 3 actin 24,213
98 Sample 3 claudin 32,065
99 Sample 3 occludin 34,556
194 H2O claudin
195 H2O occludin
217 H2O actin
218 H2O claudin
219 H2O occludin
这是我的代码:
#! usr/bin/perl
use strict;
use warnings;
# CHECK FOR CORRECT USAGE
unless (@ARGV == 1){
die "Usage: perl $0 \"file.txt\"\n";
}
my $input = "$ARGV[0]";
#chomp ($input);
open (READ, $input) || die "Cannot open $input: $!\n";
my $line = '';
my %data;
while ($line = <READ>){
chomp $line;
if ($line =~ m/^[0-9]/i);
$i++;
$data{"$i"} = [ split /\t{1}/, $line ];
}
}
正如您所看到的,我正处于程序的最开始,因为我不确定要使用哪种结构。实际上我只需要整个表的三列,即“Sample Name”,“Target Name”和“Ct Mean”。后来我想为每个Sample计算一些东西,将这些作为键可能会有所帮助。在哈希结构的散列中,我想将目标名称作为“第二个键”。有人能把我推向正确的方向吗?我目前正在努力存储数据,因为我没有使用perl更长的时间......
这是我最终想要的:
%data = (
Sample 1 => {
actin => 20.514,
claudin => 30.544,
occludin => 31.183,
},
Sample 2 => {
actin => 20.416,
claudin => 25.611,
occludin => 27.831,
},
...
);
答案 0 :(得分:1)
所以有几点 - 如果你从命令行指定的文件中读取 - 一个简单的简写是:
while ( <> ) {
其中perl 读取命令行中指定的STDIN 或文件。究竟你是如何得到sed / grep。
第二步 - 您可以使用哈希切片来解析以制表符分隔的日期。
假设您正在考虑仅提取CT_Mean:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my %results;
#read header row
chomp ( my @header = split /\t/, <> );
#tidy up leading whitespace in the fields (there's some in your example data)
s/^\s+// for @header;
#iterate the rest of STDIN or files on command line.
while ( <> ) {
#remove trailing linefeed.
chomp;
#tidy up leading whitespace again.
s/^\s+//g;
my %row;
#use hash slice to read key-value.
@row{@header} = split /\t/;
#print for debug
print Dumper \%row;
#skip the H2O lines.
next if $row{'Sample Name'} eq 'H2O';
#Cosmetic assignments - could rewrite to a single one
my $sample_name = $row{'Sample Name'};
my $ct_mean = $row{'Ct Mean'};
my $target_name = $row{'Target Name'};
$results{$sample_name}{$target_name} = $ct_mean;
}
print Dumper \%results;
给你:
$VAR1 = {
'Sample 2' => {
'occludin' => '27,831',
'actin' => '20,416',
'claudin' => '25,611'
},
'Sample 3' => {
'occludin' => '34,556',
'actin' => '24,213',
'claudin' => '32,065'
},
'Sample 1' => {
'claudin' => '30,544',
'occludin' => '31,183',
'actin' => '20,514'
}
};
(注意 - 哈希明确无序)