我有以下格式的输入数据(标签描述):
(基因条件值)
wnt condition1 1
wnt condition2 10
wnt condition3 15
wnt condition4 -1
bmp condition1 10
bmp condition2 inf
bmp condition3 12
bmp condition4 -1
frz condition1 -12
frz condition2 -6
frz condition3 -0.3
我正在建立一个HoH如下:
#!/usr/bin/perl
use warnings;
use strict;
use File::Slurp;
use Data::Dumper;
my @data = read_file('stack.txt');
my %hash;
foreach (@data){
chomp;
my ($gene, $condition, $value) = (/^(\w+)\t(\w+\d)\t(-?\d+|-?inf)/);
$hash{$gene}{$condition} = $value;
}
我想循环通过HoH,并且对于每个基因,打印出值,只要该基因的所有值都是阳性(例如10)或阴性(-3)。在上面的数据中,我只打印出来:
frz condition1 -12
frz condition2 -6
frz condition3 -0.3
由于其他基因都包含具有正面和负面价值的条件:
wnt condition1 1
wnt condition2 10
wnt condition3 15
wnt condition4 -1 # discrepancy
bmp condition1 10
bmp condition2 inf
bmp condition3 12
bmp condition4 -1 # discrepancy
我可以按如下方式循环,但不知道如何在一个HoH值和该基因条件键组合的'next'值之间进行比较:
for my $gene (sort keys %hash) {
for my $condition (sort keys %{$hash{$gene}}) {
my $value = $hash{$gene}{$condition};
print "$gene\t$condition\t$value\n" if $value =~ m/-/; # This obviously will only print out negative values. I want to compare all values here, and if they are all positive, or all negative, print them.
}
}
如果我能进一步澄清,请告诉我
答案 0 :(得分:1)
不是将一个值与其邻居隔离,而是可以迭代给定基因的整个值列表,并为正值和负值增加单独的计数器,然后比较计数以查看是否存在差异。 / p>
假设您的数据符合以下方案:
'bmp' => HASH(0x7324710)
'condition1' => 10
'condition2' => 'inf'
'condition3' => 12
'condition4' => '-1'
'frz' => HASH(0x7323c78)
'condition1' => '-12'
'condition2' => '-6'
'condition3' => '-0.3'
'wnt' => HASH(0x72a5c30)
'condition1' => 1
'condition2' => 10
'condition3' => 15
'condition4' => '-1'
对于您问题中的最后一个代码块,此替换将为您提供所需的结果:
for my $gene (sort keys %hash) {
# These variables will contain:
# - Counts of positive and negative values
my ($pos_vals, $neg_vals) = (0, 0);
# - A true/false value indicating whether discrepancy exists
my $discrepant = undef;
# - A list of the values of all conditions for a given gene
my @values = ();
# Collect condition values for this gene into @values
my @values = values %{ $hash{$gene} };
# For each such value, test for a leading - and increment
# the positive or negative value count accordingly
for @values { $_ =~ m/^-/ ? $neg_vals++ : $pos_vals++ };
# If neither counter is zero (i.e. both evaluate true), then
# a discrepancy exists; otherwise, one doesn't -- either way,
# we put the test result in $discrepant so as to produce a
# cleaner test in the following if statement
$discrepant = (($pos_vals > 0) and ($neg_vals > 0));
# In the absence of a discrepancy...
if (not $discrepant) {
# iterate over the conditions for this gene and print the gene
# name, the condition name, and the value
# NB: this is somewhat idiomatic Perl, but you'll tend to see
# it from time to time and it's thus worth knowing about
print "$gene\t$_\t$hash{$gene}->{$_}\n"
foreach sort keys %{ $hash{$gene} };
};
}
NB :这将正确处理正无穷大,但会将零视为正数,这可能不适合您的情况。数据中是否出现零值?如果是这样,他们应该被视为积极的,消极的,还是两者都没有?
答案 1 :(得分:1)
此代码通过检查每个基因的哈希值中的所有值来解决问题,如果值包含减号,则递增$neg
,否则$pos
。如果阳性计数或阴性计数为零,则所有值都具有相同的符号,并对该基因的数据进行排序和显示。
注意这会将inf
和0
视为正数,这可能是也可能不是。
请注意,使用read_file
是浪费的,因为它会立即将整个文件拉入内存。您也可以使用while
循环并逐行读取文件,而不是循环遍历数组。使用use autodie
时,无需检查文件open
调用是否成功。
use strict;
use warnings;
use autodie;
open my $fh, '<', 'stack.txt';
my %data;
while (<$fh>) {
chomp;
my ($gene, $condition, $value) = split /\t/;
$data{$gene}{$condition} = $value;
}
while (my ($gene, $values) = each %data) {
my ($pos, $neg) = (0, 0);
++(/-/ ? $neg : $pos) for values %$values;
unless ($neg and $pos) {
for my $condition (sort keys %$values) {
printf "%s\t%s\t%s\n", $gene, $condition, $values->{$condition};
}
}
}
<强>输出强>
frz condition1 -12
frz condition2 -6
frz condition3 -0.3
答案 2 :(得分:-1)
my @data = <$your_file_handle>;
my %hash;
foreach (@data){
chomp;
my ($gene, $condition, $value) = split; #Sorry, your regex didn't work for me,
#hence the change.
$hash{$gene}{$condition} = $value;
}
for my $gene (sort keys %hash){
my $values = join '', values $hash{$gene};
my $num = %{$hash{$gene}}/1; #Number of conditions
#when no '-' is detected or number of '-' matches the one of conditions, print.
say $gene if ($values !~ /-/ or $values =~ tr/-/-/ == $num);
}