标准化和非标准化数据的p值相同

时间:2017-04-14 07:29:08

标签: pandas scipy anova p-value

我正在尝试在此数据集http://vincentarelbundock.github.io/Rdatasets/csv/datasets/PlantGrowth.csv上的3组植物生长(ctrl,trt1,trt2)上实施单向anova。我正在使用Pandas和Scipy的组合。然而,通过执行数据的逐列z分数归一化的f和p值与未执行归一化的那些相同!谁能告诉我为什么会这样呢?

    import pandas as pd
    import math
    import numpy as np
    import matplotlib.pyplot as plt
    from scipy import stats

    import pandas as pd
    datafile="../data/PlantGrowth.csv"
    data = pd.read_csv(datafile)

    weight_zscore = 'weight' + '_zscore'
    data['weight_zscore'] = (data['weight']-       data['weight'].mean()/data['weight'].std(ddof=0))


    grps = pd.unique(data.group.values)
    weight_data = {grp:data['weight'][data.group == grp] for grp in grps}
    weight_zscore_data = {grp:data['weight_zscore'][data.group == grp] for grp in grps}


    F, p = stats.f_oneway(weight_data['ctrl'], weight_data['trt1'], weight_data['trt2'])
    Fz, pz = stats.f_oneway(weight_zscore_data['ctrl'], weight_zscore_data['trt1'], weight_zscore_data['trt2'])
    print "Non-Normalized weight": F, p, 
    print "Normalized weight": Fz, pz

答案是:

    Non-normalized weight: 4.84608786238, 0.0159099583256 
    Normalized weight: 4.84608786238, 0.0159099583256

1 个答案:

答案 0 :(得分:1)

我认为因为归一化是数据集的双向变换,所以它不会影响统计检验的结果。例如,如果您正在进行均值测试,则通过从每个均值中减去5,您不会影响测试结果。同样,通过将均值除以一个值,甚至整个数据集,您不会影响p值或其他可以计算出的分数。