结果应为不包含任何异常值的mi数据框。标准为标准偏差:np.abs(x-g_mean) <= 3*g_std
我试图确定统计离群值:
import pandas as pd
import numpy as np
#create sample
arrays = [[1,1,1,2,2,2,3,3],
[0,1,2,0,1,2,0,1]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['ID', 'INDEX'])
df = pd.DataFrame(np.abs(np.random.randn(8, 2)), index=index, columns=['Ts','Tf'])
#groupby index and learn from data
g = df.groupby(level='INDEX')
g_mean=g.mean()
g_std = g.std()
#groupby ID and look if some ID is an outlier
g = df.groupby(level='ID')
test = g.apply(lambda x: True if np.abs(x-g_mean) <= 3*g_std else False)
代码的最后一行不起作用,因为在最后一组中,我比较了两种不同形式的数据帧。有什么建议吗?
答案 0 :(得分:1)
您可以使用:
for i in range(0, 10):
print('a%d'%i)