按字典值过滤数据帧,但有时包括值,有时则不包括这些值

时间:2019-06-26 12:23:35

标签: python pandas

我正在尝试通过字典过滤数据框。

但是,我想将filters['age']视为要从df中排除而不是包括在内的值的列表。

我可以以某种方式重写下面的代码,以便输出为john 42 London而不是当前的john 11 Warsaw吗?

我唯一的想法是编写两个过滤器字典,一个包含要包含的值,另一个包含要排除的值,然后用.isin~isin分别过滤df。但是也许还有另一种方式?

import pandas as pd

d = {
    'name': ['john', 'mike', 'john', 'tim'],
    'age': [42, 24, 11, 66],
    'city': ['London', 'Tokyo', 'Warsaw', 'New York'],
}

filters = {
    'name': ['john', 'mike'],
    'age': [66, 11, 24], # I want these to be excluded. So that age 66 and 11 are not included in the filtered df
    'city': ['Warsaw', 'London', 'Tokyo'],
}

def get_filtered_df(df, filters):
    for filter_name, filter_value in filters.items():
        mask = df[filter_name].isin(filter_value)
        df = df[mask]
    return df

df = pd.DataFrame(d)
filtered_df = get_filtered_df(df, filters)
print(filtered_df)

# output is:
# name  age    city
# john   11  Warsaw 

2 个答案:

答案 0 :(得分:2)

您只需添加适当的条件即可取消/反转mask

...

def get_filtered_df(df, filters):
    for filter_name, filter_value in filters.items():
        mask = df[filter_name].isin(filter_value)
        if filter_name == 'age':
            mask = ~mask
        df = df[mask]
    return df

df = pd.DataFrame(d)
filtered_df = get_filtered_df(df, filters)
print(filtered_df)

输出:

   name  age    city
0  john   42  London
1  mike   24   Tokyo

答案 1 :(得分:1)

创建两个列表,一个列表包含要包含的参数,另一个包含要排除的参数。并相应地修改蒙版

include = ["name", "city"]
exclude = ["age"]

def get_filtered_df(df, filters, include):
    for filter_name, filter_value in filters.items():
        mask = df[filter_name].isin(filter_value)
        if filter_name not in include:
            mask = ~mask
        df = df[mask]
    return df

df = pd.DataFrame(d)
filtered_df = get_filtered_df(df, filters)
print(filtered_df)

输出符合预期

   name  age    city
0  john   42  London
1  mike   24   Tokyo