Question

给定df

            A         B         C
Date            
2010-01-17  -0.9304   3.7477    0.0000
2010-01-24  -3.6348   1.5733   -3.6348
2010-01-31  -1.8950   0.4957   -1.8950
2010-02-07  -0.6990  -0.1480   -0.6990
2010-02-14   1.4635  -3.4206    1.4635

我想将df [＆＃39; C＆＃39;]的平均值与通过从df [＆＃39; A＆＃39;] OR中选择1个元素而创建的10.000随机系列进行比较[＆＃39; ; B＆＃39;]，对于每个日期，要查看平均排名的位置（如果最高则为1，如果高于9500，则为0.95等）。

我写了一个旧的公式，但我不能再把它放在一起，也许这有帮助

def mean_diff(d):
    result = {}
    for k, (l, t) in d.iteritems():
        m = np.mean(t)
        len_ = len(t)
        result[k] = np.mean([m > np.mean(npr.choice(l, len_, True))
                            for _ in range(10000)])
    return result

由于

** 10000因为原始数据有超过5行。

更新：

好吧，为了解决这个问题，我必须开始解决一个小问题。见question

Answer 1

嗯，有一条捷径：

由于我们在A列，B列中都有相同数量的元素。我们可以将它们放在一个列表中，从该列表中取出10000个随机样本，并将它们与C的平均值进行比较

sample = df['C'].values
a = df['A'].values
b = df['B'].values
population = np.concatenate((a,b), axis=0)

def mean_diff(s, p):
    m = np.mean(s)
    len_ = len(s)
    result = np.mean([m > np.mean(npr.choice(p, len_, True))
                            for _ in range(10000)])
    return result

mean_diff(sample, population)

比较Python中的样本均值与随机分类

1 个答案: