如何在多索引数据框中找到离群值帧

时间:2018-11-07 05:22:15

标签: python pandas numpy

结果应为不包含任何异常值的mi数据框。标准为标准偏差:np.abs(x-g_mean) <= 3*g_std

我试图确定统计离群值:

import pandas as pd
import numpy as np

#create sample
arrays = [[1,1,1,2,2,2,3,3],
          [0,1,2,0,1,2,0,1]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['ID', 'INDEX'])
df = pd.DataFrame(np.abs(np.random.randn(8, 2)), index=index, columns=['Ts','Tf'])

#groupby index and learn from data
g = df.groupby(level='INDEX')
g_mean=g.mean()
g_std = g.std()

#groupby ID and look if some ID is an outlier
g = df.groupby(level='ID')
test = g.apply(lambda x: True if np.abs(x-g_mean) <= 3*g_std else False)

代码的最后一行不起作用,因为在最后一组中,我比较了两种不同形式的数据帧。有什么建议吗?

1 个答案:

答案 0 :(得分:1)

您可以使用:

for i in range(0, 10):

   print('a%d'%i)