我遵循以下逻辑,
from scipy import stats
df = pd.DataFrame(np.random.randn(100, 3))
df[(np.abs(stats.zscore(df)) < 3).all(axis=1)]
我的df有多个列,包括value1,value2,描述,任务等。因此,我在处理A)一半的列为text和B)仅从value1列中删除异常值时遇到了麻烦。我知道上面的代码将删除在value1或2中具有异常值的行-我将如何调整它以仅查看value1?
更新的代码:
for y in yvar:
temp = combo
temp = temp[(temp['Financial Metric'] == y) & (temp['Financial Value'] != 0)]
temp = temp.loc[np.abs(stats.zscore(temp['Financial Value'])) < 3]
for x in xvar:
temp2 = temp
temp2 = temp2[(temp2['External Metric'] == x) & (temp2['External Value'] != 0)]
temp2 = temp2.loc[np.abs(stats.zscore(temp2['External Value'])) < 3]
c = len(temp2.index)
r = temp2['Financial Value'].corr(temp2['External Value'])
col1.append(y)
col2.append(x)
col3.append(r)
col4.append(c)
temp2.plot(x ='External Value', y='Financial Value', kind = 'scatter')
答案 0 :(得分:0)
您可以使用loc
仅基于像这样的value1列过滤数据框
df.loc[np.abs(stats.zscore(df['value1'])) < 3]