Question

我遵循以下逻辑，

from scipy import stats
df = pd.DataFrame(np.random.randn(100, 3))
df[(np.abs(stats.zscore(df)) < 3).all(axis=1)]

我的df有多个列，包括value1，value2，描述，任务等。因此，我在处理A）一半的列为text和B）仅从value1列中删除异常值时遇到了麻烦。我知道上面的代码将删除在value1或2中具有异常值的行-我将如何调整它以仅查看value1？

更新的代码：

for y in yvar:
    temp = combo
    temp = temp[(temp['Financial Metric'] == y) & (temp['Financial Value'] != 0)]
    temp = temp.loc[np.abs(stats.zscore(temp['Financial Value'])) < 3]
    for x in xvar:
        temp2 = temp
        temp2 = temp2[(temp2['External Metric'] == x) & (temp2['External Value'] != 0)]
        temp2 = temp2.loc[np.abs(stats.zscore(temp2['External Value'])) < 3]
        c = len(temp2.index)
        r = temp2['Financial Value'].corr(temp2['External Value'])
        col1.append(y)
        col2.append(x)
        col3.append(r)
        col4.append(c)
        temp2.plot(x ='External Value', y='Financial Value', kind = 'scatter')

Answer 1

您可以使用loc仅基于像这样的value1列过滤数据框

df.loc[np.abs(stats.zscore(df['value1'])) < 3]

熊猫从DataFrame中移除异常值

1 个答案: