熊猫从DataFrame中移除异常值

时间:2020-07-10 15:28:18

标签: python pandas

我遵循以下逻辑,

from scipy import stats
df = pd.DataFrame(np.random.randn(100, 3))
df[(np.abs(stats.zscore(df)) < 3).all(axis=1)]

我的df有多个列,包括value1,value2,描述,任务等。因此,我在处理A)一半的列为text和B)仅从value1列中删除异常值时遇到了麻烦。我知道上面的代码将删除在value1或2中具有异常值的行-我将如何调整它以仅查看value1?

更新的代码:

for y in yvar:
    temp = combo
    temp = temp[(temp['Financial Metric'] == y) & (temp['Financial Value'] != 0)]
    temp = temp.loc[np.abs(stats.zscore(temp['Financial Value'])) < 3]
    for x in xvar:
        temp2 = temp
        temp2 = temp2[(temp2['External Metric'] == x) & (temp2['External Value'] != 0)]
        temp2 = temp2.loc[np.abs(stats.zscore(temp2['External Value'])) < 3]
        c = len(temp2.index)
        r = temp2['Financial Value'].corr(temp2['External Value'])
        col1.append(y)
        col2.append(x)
        col3.append(r)
        col4.append(c)
        temp2.plot(x ='External Value', y='Financial Value', kind = 'scatter')

enter image description here

1 个答案:

答案 0 :(得分:0)

您可以使用loc仅基于像这样的value1列过滤数据框

df.loc[np.abs(stats.zscore(df['value1'])) < 3]