我有一个我无法弄清楚的神秘问题。
我正在使用Statistics::TTest
来计算数千对数字分布的p值。
我正在使用这些p值来创建volcano plot,当我绘制p值时,我观察到一个奇怪的神器,其中许多点得到相同的p值。
经过一些调查,我可以用下面的代码中的四对数字重新创建这种现象。
当我在Excel中计算这些对的p值时,这些值都非常不同(不同几个数量级)但是使用Statistics::TTes
我得到完全相同的值对于每一对。
p值非常小(约1.6e-12),所以我想知道这不是某种精确问题,但我无法弄明白。
如果您运行下面的代码,它将显示四个相同的p值(T检验概率t_prob
),尽管真正的p值范围从1.7e-19
到{{1} }。
我尝试使用2.8e-29
做类似的事情,但我遇到了同样的问题,但我认为Statistics::Distributions
依靠Statistics::TTest
进行这些计算。
我无法找到执行此计算的任何其他模块。
我应该注意到绝大多数(99%)的分布对都得到了正确的p值。只是少数几个在错误的值上产生这种奇怪的碰撞。
有没有人能够了解出了什么问题?
非常感谢帮助,谢谢!
以下是代码:
Statistics::Distributions
编辑:我还没有弄清楚Statistics :: TTest或Statistics :: Distributions正在发生什么,但我发现另一个模块可以正常工作。如果其他人遇到这个问题,我会在这里发布。据我了解,单对分布上的单因子方差分析相当于学生的T检验。因此,我尝试使用Statistics :: ANOVA,我取得了成功。使用上面代码中%datasets的定义,以下循环将计算正确的p值(匹配Excel给出的值):
#!/usr/bin/perl -w
use strict;
use Statistics::TTest;
my %datasets = ();
@{$datasets{a}[0]} = (0.722466024,0.925999419,1,1.049630768,1.056583528,1.10433666,1.13093087,1.150559677,1.220329955,1.316145742,1.333423734,1.63691458,1.691534165,0.713695815,0.815575429,0.918386234,0.925999419,0.941106311,0.948600847,0.970853654,0.98550043,0.98550043,1,1.028569152,1.042644337,1.10433666,1.117695043,1.269033146,1.286881148,1.298658316,1.575312331);
@{$datasets{a}[1]} = (-0.49410907,-0.358453971,-0.321928095,-0.286304185,-0.200912694,-0.200912694,-0.168122759,-0.120294234,-0.120294234,-0.104697379,-0.104697379,-0.074000581,-0.058893689,-0.577766999,-0.514573173,-0.514573173,-0.49410907,-0.358453971,-0.358453971,-0.304006187,-0.251538767,-0.184424571);
@{$datasets{b}[0]} = (-0.434402824,-0.286304185,-0.251538767,-0.058893689,-0.043943348,-0.043943348,0.084064265,0.163498732,0.23878686,0.23878686,0.310340121,0.839959587,0.879705766,-0.556393349,-0.268816758,-0.251538767,-0.152003093,-0.104697379,-0.089267338,-0.029146346,-0.029146346,0,0.070389328,0.084064265,0.097610797,0.124328135,0.137503524,0.189033824,0.189033824,0.214124805,0.214124805,0.214124805,0.321928095,0.333423734,0.367371066,0.40053793,0.411426246,0.443606651,0.516015147,0.669026766,0.713695815);
@{$datasets{b}[1]} = (0.782408565,0.799087306,0.82374936,0.887525271,0.925999419,0.933572638,0.956056652,0.97819563,0.98550043,1.021479727,1.084064265,1.097610797,1.13093087,1.150559677,1.15704371,1.176322773,1.182692298,1.22650853,1.286881148,1.292781749,1.310340121,1.459431619,1.485426827,1.521050737,1.59454855,1.695993813,1.713695815,1.726831217,0.40053793,0.411426246,0.59454855,0.925999419,0.941106311,0.948600847,0.98550043,1.028569152,1.070389328,1.117695043,1.124328135,1.220329955,1.316145742,1.744161096);
@{$datasets{c}[0]} = (-0.043943348,-0.029146346,-0.01449957,0.028569152,0.097610797,0.124328135,0.176322773,0.201633861,0.263034406,-0.862496476,-0.104697379,0.084064265,0.084064265,0.084064265,0.124328135,0.124328135,0.163498732,0.263034406,0.275007047,0.286881148,0.321928095,0.333423734);
@{$datasets{c}[1]} = (-2.64385619,-2.556393349,-2.473931188,-2.395928676,-2.395928676,-2.395928676,-2.321928095,-2.321928095,-2.321928095,-2.251538767,-2.251538767,-2.184424571,-2.120294234,-2,-0.535331733,-1.64385619,-1.556393349,-1.514573173,-1.514573173,-1.473931188,-1.434402824,-1.434402824,-1.395928676,-1.395928676,-1.395928676,-1.395928676,-1.358453971,-1.358453971,-1.358453971,-1.358453971,-1.358453971,-1.321928095,-1.286304185,-1.286304185,-1.286304185,-1.251538767,-1.217591435,-1.120294234,-1);
@{$datasets{d}[0]} = (0.933572638261024,0.948600847493356,0.948600847493356,0.970853654340483,0.978195629681652,1.111031312388740,1.150559676575380,1.416839741912830,0.731183241572200,0.790772037862000,0.815575428862573,0.855989697308481,0.871843648509318,0.895302621333307,0.933572638261024,0.941106310946431,0.948600847493356,0.956056652412403,0.970853654340483,0.992768430768924,1.000000000000000,1.063502942306160,1.226508529808680,1.269033146455240,1.298658315564520,1.704871964456350);
@{$datasets{d}[1]} = (-0.473931188332412,0.028569152196771,0.042644337408494,0.056583528366368,0.070389327891398,0.084064264788475,0.097610796626422,0.111031312388744,0.454175893185802,0.454175893185802,-0.514573172829758,-0.268816758427800,-0.168122758808327,-0.136061549576028,-0.043943347587597,0.014355292977070,0.111031312388744,0.124328135002202,0.137503523749935,0.176322772640463,0.238786859587116,0.250961573533219,0.344828496997441);
foreach my $dataset (sort keys %datasets) {
my $ttest = new Statistics::TTest;
$ttest->load_data(\@{$datasets{$dataset}[0]},\@{$datasets{$dataset}[1]});
print "$dataset - t_prob:\t$ttest->{t_prob}\n\n";
}
答案 0 :(得分:0)
如果你有64位平台,你有可能通过安装64位perl获得一些东西,但这个问题可能最好是针对模块的作者。
撰写 撰写