在python中获得统计测试的奇怪值

时间:2017-10-31 16:28:40

标签: python arrays statistics jupyter-notebook definition

我正在尝试对某些数据执行Mann-Kendall测试。我正在使用以下链接中的代码(https://github.com/mps9506/Mann-Kendall-Trend/blob/master/mk_test.py)稍微修改一下,结果是在数组中,它只返回一个p值(p)和一个tau值(z)。

def mk_test(x, alpha=0.05):
    """
    This function is derived from code originally posted by Sat Kumar Tomer
    (satkumartomer@gmail.com)
    See also: http://vsp.pnnl.gov/help/Vsample/Design_Trend_Mann_Kendall.htm
    The purpose of the Mann-Kendall (MK) test (Mann 1945, Kendall 1975, Gilbert
    1987) is to statistically assess if there is a monotonic upward or downward
    trend of the variable of interest over time. A monotonic upward (downward)
    trend means that the variable consistently increases (decreases) through
    time, but the trend may or may not be linear. The MK test can be used in
    place of a parametric linear regression analysis, which can be used to test
    if the slope of the estimated linear regression line is different from
    zero. The regression analysis requires that the residuals from the fitted
    regression line be normally distributed; an assumption not required by the
    MK test, that is, the MK test is a non-parametric (distribution-free) test.
    Hirsch, Slack and Smith (1982, page 107) indicate that the MK test is best
    viewed as an exploratory analysis and is most appropriately used to
    identify stations where changes are significant or of large magnitude and
    to quantify these findings.
    Input:
        x:   a vector of data
        alpha: significance level (0.05 default)
    Output:
        trend: tells the trend (increasing, decreasing or no trend)
        h: True (if trend is present) or False (if trend is absence)
        p: p value of the significance test
        z: normalized test statistics`

    Examples
    --------
      >>> x = np.random.rand(100)
      >>> trend,h,p,z = mk_test(x,0.05)
    """
    n = len(x)

    # calculate S
    s = 0
    for k in range(n-1):
        for j in range(k+1, n):
            s += np.sign(x[j] - x[k])

    # calculate the unique data
    unique_x = np.unique(x)
    g = len(unique_x)

    # calculate the var(s)
    if n == g:  # there is no tie
        var_s = (n*(n-1)*(2*n+5))/18
    else:  # there are some ties in data
        tp = np.zeros(unique_x.shape)
        for i in range(len(unique_x)):
            tp[i] = sum(x == unique_x[i])
        var_s = (n*(n-1)*(2*n+5) - np.sum(tp*(tp-1)*(2*tp+5)))/18

    if s > 0:
        z = (s - 1)/np.sqrt(var_s)
        #result = (s - 1)/np.sqrt(var_s)
    elif s == 0:
         z = 0
        #result = 0
    elif s < 0:
        z = (s + 1)/np.sqrt(var_s)
        #result = (s + 1)/np.sqrt(var_s)

    # calculate the p_value
    p = 2*(1-norm.cdf(abs(z)))  # two tail test
    result= np.append(p,z)
    h = abs(z) > norm.ppf(1-alpha/2)

    return np.array(result)

然后我使用以下代码执行测试。

out = np.empty((0))
for i in range(145):
    for j in range(192):
        out1 = mk_test(yrmax[:,i,j], alpha=0.05)
        out = np.append(out, out1, axis=0)

我觉得在执行测试时某些地方出了问题,因为我希望得到的z值介于-1和1之间,但是我得到了一些大于1的值。编码出错还是我误解了z是什么,它实际上不是tau,因此我为什么会得到我不期望的价值?

1 个答案:

答案 0 :(得分:0)

这最终成为一个统计问题,但在stackexchange上我提供了一个修改此代码的解决方案,以获得Kendall-Tau值而不是z值。他们还解释了什么是z值,只要这个问题在这里,我就会为出现类似错误的人提供链接。

https://stats.stackexchange.com/questions/311061/getting-weird-values-for-a-statistical-test-in-python