我有以下数据框我想过滤组,只从左侧组中选择一行。
state city voting_majority_status_fk other
0 A A1 4 True
1 A A1 4 True
2 A A1 2 False
3 A A2 3 True
4 B B2 4 False
5 B B2 2 True
6 C C1 4 True
7 C C1 4 True
8 C C1 2 False
我想对它进行分组,只从正面组中取出一行:
我希望我的最终结果只是:
2 A A1 2 False
8 C C1 2 False
我的代码到现在为止:
columns = ['state', ' city', 'voting_majority_status_fk', 'other']
data = [['A', 'A1', 4, True],
['A', 'A1', 4, True],
['A', 'A1', 2, False],
['A', 'A2', 3, True],
['B', 'B2', 4, False],
['B', 'B2', 2, True],
['C', 'C1', 4, True],
['C', 'C1', 4, True],
['C', 'C1', 2, False],
['C', 'C3', 2, False]]
df = pd.DataFrame(data=data, columns=columns)
grouped_df = df.groupby(['state', ' city'])
filtered_data = grouped_df.filter(VotingDataFetcher.my_filter)
@staticmethod
def my_filter(group):
if 3 in group.voting_majority_status_fk.unique():
return False
if 2 not in group.voting_majority_status_fk.unique():
return False
if 4 in group.voting_majority_status_fk.unique():
majority = group[group.voting_majority_status_fk == 4].head(1)
if not majority.other.tolist()[0]:
return False
else:
minority = group[group.voting_majority_status_fk == 2]
tt = minority.head(1) <= I only want those lines.
return True
return False
我得到以下输出,我得到整个组,但我只需要从组中选择行。
0 A A1 4 True
1 A A1 4 True
2 A A1 2 False <= only this one
6 C C1 4 True
7 C C1 4 True
8 C C1 2 False <= and this one
答案 0 :(得分:1)
您需要apply
自定义函数返回tt
:
def my_filter(group):
vuniq = group.voting_majority_status_fk.unique()
if (4 in vuniq) and (2 in vuniq) and not (3 in vuniq):
majority = group[group.voting_majority_status_fk == 4].head(1)
if majority.other.tolist()[0]:
minority = group[group.voting_majority_status_fk == 2]
tt = minority.head(1) #<= I only want those lines.
return tt
df = pd.DataFrame(data=data, columns=columns)
grouped_df = df.groupby(['state', ' city'])
filtered_data = grouped_df.apply(my_filter).reset_index(drop=True)
print (filtered_data)
state city voting_majority_status_fk other
0 A A1 2 False
1 C C1 2 False
您无法使用filter
,因为它会为每个群组返回True
或False
,并决定是否删除群组。
您可以通过以下方式进行测试:
filtered_data = grouped_df.apply(my_filter)
print (filtered_data)
state city
A A1 True
A2 False
B B2 False
C C1 True
C3 None
dtype: object