从分组数据框中选择所选行

时间:2016-12-21 09:40:39

标签: python pandas dataframe

我有以下数据框我想过滤组,只从左侧组中选择一行。

  state  city  voting_majority_status_fk  other
0     A    A1                          4   True
1     A    A1                          4   True
2     A    A1                          2  False
3     A    A2                          3   True
4     B    B2                          4  False
5     B    B2                          2   True
6     C    C1                          4   True
7     C    C1                          4   True
8     C    C1                          2  False

我想对它进行分组,只从正面组中取出一行:

我希望我的最终结果只是:

2     A    A1                          2  False
8     C    C1                          2  False

我的代码到现在为止:

columns = ['state', ' city', 'voting_majority_status_fk', 'other']
        data = [['A', 'A1', 4, True],
                ['A', 'A1', 4, True],
                ['A', 'A1', 2, False],
                ['A', 'A2', 3, True],
                ['B', 'B2', 4, False],
                ['B', 'B2', 2, True],
                ['C', 'C1', 4, True],
                ['C', 'C1', 4, True],
                ['C', 'C1', 2, False],
                ['C', 'C3', 2, False]]

        df = pd.DataFrame(data=data, columns=columns)
        grouped_df = df.groupby(['state', ' city'])
        filtered_data = grouped_df.filter(VotingDataFetcher.my_filter)

@staticmethod
    def my_filter(group):
        if 3 in group.voting_majority_status_fk.unique():
            return False
        if 2 not in group.voting_majority_status_fk.unique():
            return False
        if 4 in group.voting_majority_status_fk.unique():
            majority = group[group.voting_majority_status_fk == 4].head(1)
            if not majority.other.tolist()[0]:
                    return False
            else:
                minority = group[group.voting_majority_status_fk == 2]
                tt = minority.head(1) <= I only want those lines.
                return True
        return False

我得到以下输出,我得到整个组,但我只需要从组中选择行。

0     A    A1                          4   True
1     A    A1                          4   True
2     A    A1                          2  False <= only this one
6     C    C1                          4   True
7     C    C1                          4   True
8     C    C1                          2  False <= and this one

1 个答案:

答案 0 :(得分:1)

您需要apply自定义函数返回tt

def my_filter(group):
    vuniq = group.voting_majority_status_fk.unique()
    if (4 in vuniq) and (2 in vuniq) and not (3 in vuniq):
        majority = group[group.voting_majority_status_fk == 4].head(1)
        if majority.other.tolist()[0]:
            minority = group[group.voting_majority_status_fk == 2]
            tt = minority.head(1) #<= I only want those lines.
            return tt

df = pd.DataFrame(data=data, columns=columns)
grouped_df = df.groupby(['state', ' city'])
filtered_data = grouped_df.apply(my_filter).reset_index(drop=True)
print (filtered_data)
  state  city  voting_majority_status_fk  other
0     A    A1                          2  False
1     C    C1                          2  False

您无法使用filter,因为它会为每个群组返回TrueFalse,并决定是否删除群组。

您可以通过以下方式进行测试:

filtered_data = grouped_df.apply(my_filter)
print (filtered_data)
state   city
A      A1        True
       A2       False
B      B2       False
C      C1        True
       C3        None
dtype: object