Python Pandas:如何删除具有不合理值的记录

时间:2017-11-10 15:28:30

标签: python pandas dataframe

如何从包含异常值数据的数据框中删除记录
 在一列或多列中,与平均值相差3个标准值的值

示例:

row0    2    3    4    3  
row1    2    3    4    3  
row2    2    3    432  3  
row3    2    3    4    3

我想删除 row2 ,因为值 [432]

谢谢。

1 个答案:

答案 0 :(得分:0)

import numpy as np
import pandas as pd
data = np.array([['','Col1','Col2'],
                ['Row1',1,2],
                ['Row2',3,4]])
df= pd.DataFrame(data=data[1:,1:],
                  index=data[1:,0],
                  columns=data[0,1:])
#Convert to numeric                  
df1=df.apply(pd.to_numeric)
#Calculate the mean and STD
mean=df1.stack().mean()
std=df1.stack().std()
df1["Col3"]=mean+(std*3)
df1["Col4"]=mean-(std*3)
df1.Col3 = df1.Col3.astype(int)
df1.Col4 = df1.Col4.astype(int)
#See whether the values fall between the mean+(3*STD) and mean-(3*STD)
df1['Between1'] = (df1['Col1'] > df1['Col4']) & (df1['Col1'] < df1['Col3'])
df1['Between2'] = (df1['Col2'] > df1['Col4']) & (df1['Col2'] < df1['Col3'])
df1.head()
#Keep only the rows that are True
df1 = df1[df1['Between1'] == True]
df1 = df1[df1['Between2'] == True]
df1.head()