假设我有销售人员的实际销售额数据,如下所示:
df = pd.DataFrame({'Salesperson id': [1, 2, 3, 4], "Q3 sales": [105, 82, 230, 58]})
Salesperson id Q3 sales
0 1 105
1 2 82
2 3 230
3 4 58
我也有他们这样的销售配额:
quotas_df = pd.DataFrame({'Salesperson id': [1, 2, 3, 4], "Quota": [88, 95, 200, 65]})
quotas_df = quotas_df.set_index('Salesperson id')
Quota:
Salesperson id
1 88
2 95
3 200
4 65
我想获取 df
的子集,其中仅包含销售人员超出其销售配额的行。我尝试以下操作:
filtered_df = df[(df['Q3 sales'] > quotas_df.loc[df['Salesperson id']]['Quota'])]
这失败了:
ValueError: Can only compare identically-labeled Series objects
是否有任何关于最佳方法的指示?
答案 0 :(得分:2)
您收到错误是因为两个 DataFrame 的索引未对齐。
(
df.set_index('Salesperson id')
.loc[lambda x: x['Q3 sales'] > quotas_df['Quota']]
)
答案 1 :(得分:1)
使用Series.map
:
df = pd.DataFrame({'Salesperson id': [1, 2, 3, 4], "Q3 sales": [105, 82, 230, 58]})
quotas_df = pd.DataFrame({'Salesperson id': [1, 2, 3, 4], "Quota": [88, 95, 200, 65]})
s = df['Salesperson id'].map(quotas_df.set_index('Salesperson id')['Quota']))
filtered_df = df[(df['Q3 sales'] > s]
print (filtered_df)
Salesperson id Q3 sales
0 1 105
2 3 230
答案 2 :(得分:1)
您可以合并两个数据框,然后正常过滤:
df = pd.DataFrame({'Salesperson id': [1, 2, 3, 4], "Q3 sales": [105, 82, 230, 58]})
quotas_df = pd.DataFrame({'Salesperson id': [1, 2, 3, 4], "Quota": [88, 95, 200, 65]})
filtered_df = df.merge(quotas_df, on='Salesperson id')
filtered_df[filtered_df['Q3 sales'] > filtered_df['Quota']]
输出:
Salesperson id Q3 sales Quota
0 1 105 88
2 3 230 200