使用Statistics :: T Test计算Perl中的p值时发生奇怪的碰撞

时间:2012-06-23 16:47:33

标签: perl statistics

我有一个我无法弄清楚的神秘问题。

我正在使用Statistics::TTest来计算数千对数字分布的p值。

我正在使用这些p值来创建volcano plot,当我绘制p值时,我观察到一个奇怪的神器,其中许多点得到相同的p值。

经过一些调查,我可以用下面的代码中的四对数字重新创建这种现象。

当我在Excel中计算这些对的p值时,这些值都非常不同(不同几个数量级)但是使用Statistics::TTes我得到完全相同的值对于每一对。

p值非常小(约1.6e-12),所以我想知道这不是某种精确问题,但我无法弄明白。

如果您运行下面的代码,它将显示四个相同的p值(T检验概率t_prob),尽管真正的p值范围从1.7e-19到{{1} }。

我尝试使用2.8e-29做类似的事情,但我遇到了同样的问题,但我认为Statistics::Distributions依靠Statistics::TTest进行这些计算。

我无法找到执行此计算的任何其他模块。

我应该注意到绝大多数(99%)的分布对都得到了正确的p值。只是少数几个在错误的值上产生这种奇怪的碰撞。

有没有人能够了解出了什么问题?

非常感谢帮助,谢谢!

以下是代码:

Statistics::Distributions
编辑:我还没有弄清楚Statistics :: TTest或Statistics :: Distributions正在发生什么,但我发现另一个模块可以正常工作。如果其他人遇到这个问题,我会在这里发布。据我了解,单对分布上的单因子方差分析相当于学生的T检验。因此,我尝试使用Statistics :: ANOVA,我取得了成功。使用上面代码中%datasets的定义,以下循环将计算正确的p值(匹配Excel给出的值):

#!/usr/bin/perl -w

use strict;
use Statistics::TTest;

my %datasets = ();


@{$datasets{a}[0]} = (0.722466024,0.925999419,1,1.049630768,1.056583528,1.10433666,1.13093087,1.150559677,1.220329955,1.316145742,1.333423734,1.63691458,1.691534165,0.713695815,0.815575429,0.918386234,0.925999419,0.941106311,0.948600847,0.970853654,0.98550043,0.98550043,1,1.028569152,1.042644337,1.10433666,1.117695043,1.269033146,1.286881148,1.298658316,1.575312331);
@{$datasets{a}[1]} = (-0.49410907,-0.358453971,-0.321928095,-0.286304185,-0.200912694,-0.200912694,-0.168122759,-0.120294234,-0.120294234,-0.104697379,-0.104697379,-0.074000581,-0.058893689,-0.577766999,-0.514573173,-0.514573173,-0.49410907,-0.358453971,-0.358453971,-0.304006187,-0.251538767,-0.184424571);

@{$datasets{b}[0]} = (-0.434402824,-0.286304185,-0.251538767,-0.058893689,-0.043943348,-0.043943348,0.084064265,0.163498732,0.23878686,0.23878686,0.310340121,0.839959587,0.879705766,-0.556393349,-0.268816758,-0.251538767,-0.152003093,-0.104697379,-0.089267338,-0.029146346,-0.029146346,0,0.070389328,0.084064265,0.097610797,0.124328135,0.137503524,0.189033824,0.189033824,0.214124805,0.214124805,0.214124805,0.321928095,0.333423734,0.367371066,0.40053793,0.411426246,0.443606651,0.516015147,0.669026766,0.713695815);
@{$datasets{b}[1]} = (0.782408565,0.799087306,0.82374936,0.887525271,0.925999419,0.933572638,0.956056652,0.97819563,0.98550043,1.021479727,1.084064265,1.097610797,1.13093087,1.150559677,1.15704371,1.176322773,1.182692298,1.22650853,1.286881148,1.292781749,1.310340121,1.459431619,1.485426827,1.521050737,1.59454855,1.695993813,1.713695815,1.726831217,0.40053793,0.411426246,0.59454855,0.925999419,0.941106311,0.948600847,0.98550043,1.028569152,1.070389328,1.117695043,1.124328135,1.220329955,1.316145742,1.744161096);

@{$datasets{c}[0]} = (-0.043943348,-0.029146346,-0.01449957,0.028569152,0.097610797,0.124328135,0.176322773,0.201633861,0.263034406,-0.862496476,-0.104697379,0.084064265,0.084064265,0.084064265,0.124328135,0.124328135,0.163498732,0.263034406,0.275007047,0.286881148,0.321928095,0.333423734);
@{$datasets{c}[1]} = (-2.64385619,-2.556393349,-2.473931188,-2.395928676,-2.395928676,-2.395928676,-2.321928095,-2.321928095,-2.321928095,-2.251538767,-2.251538767,-2.184424571,-2.120294234,-2,-0.535331733,-1.64385619,-1.556393349,-1.514573173,-1.514573173,-1.473931188,-1.434402824,-1.434402824,-1.395928676,-1.395928676,-1.395928676,-1.395928676,-1.358453971,-1.358453971,-1.358453971,-1.358453971,-1.358453971,-1.321928095,-1.286304185,-1.286304185,-1.286304185,-1.251538767,-1.217591435,-1.120294234,-1);

@{$datasets{d}[0]} = (0.933572638261024,0.948600847493356,0.948600847493356,0.970853654340483,0.978195629681652,1.111031312388740,1.150559676575380,1.416839741912830,0.731183241572200,0.790772037862000,0.815575428862573,0.855989697308481,0.871843648509318,0.895302621333307,0.933572638261024,0.941106310946431,0.948600847493356,0.956056652412403,0.970853654340483,0.992768430768924,1.000000000000000,1.063502942306160,1.226508529808680,1.269033146455240,1.298658315564520,1.704871964456350);
@{$datasets{d}[1]} = (-0.473931188332412,0.028569152196771,0.042644337408494,0.056583528366368,0.070389327891398,0.084064264788475,0.097610796626422,0.111031312388744,0.454175893185802,0.454175893185802,-0.514573172829758,-0.268816758427800,-0.168122758808327,-0.136061549576028,-0.043943347587597,0.014355292977070,0.111031312388744,0.124328135002202,0.137503523749935,0.176322772640463,0.238786859587116,0.250961573533219,0.344828496997441);


foreach my $dataset (sort keys %datasets) {

    my $ttest = new Statistics::TTest;
    $ttest->load_data(\@{$datasets{$dataset}[0]},\@{$datasets{$dataset}[1]});
    print "$dataset - t_prob:\t$ttest->{t_prob}\n\n";

}

1 个答案:

答案 0 :(得分:0)

如果你有64位平台,你有可能通过安装64位perl获得一些东西,但这个问题可能最好是针对模块的作者。

Statistics::TTestYun-Fang Juan

撰写

Statistics::DistributionsMichael Kospach

撰写