我正在尝试通过字典过滤数据框。
但是,我想将filters['age']
视为要从df中排除而不是包括在内的值的列表。
我可以以某种方式重写下面的代码,以便输出为john 42 London
而不是当前的john 11 Warsaw
吗?
我唯一的想法是编写两个过滤器字典,一个包含要包含的值,另一个包含要排除的值,然后用.isin
和~isin
分别过滤df。但是也许还有另一种方式?
import pandas as pd
d = {
'name': ['john', 'mike', 'john', 'tim'],
'age': [42, 24, 11, 66],
'city': ['London', 'Tokyo', 'Warsaw', 'New York'],
}
filters = {
'name': ['john', 'mike'],
'age': [66, 11, 24], # I want these to be excluded. So that age 66 and 11 are not included in the filtered df
'city': ['Warsaw', 'London', 'Tokyo'],
}
def get_filtered_df(df, filters):
for filter_name, filter_value in filters.items():
mask = df[filter_name].isin(filter_value)
df = df[mask]
return df
df = pd.DataFrame(d)
filtered_df = get_filtered_df(df, filters)
print(filtered_df)
# output is:
# name age city
# john 11 Warsaw
答案 0 :(得分:2)
您只需添加适当的条件即可取消/反转mask
:
...
def get_filtered_df(df, filters):
for filter_name, filter_value in filters.items():
mask = df[filter_name].isin(filter_value)
if filter_name == 'age':
mask = ~mask
df = df[mask]
return df
df = pd.DataFrame(d)
filtered_df = get_filtered_df(df, filters)
print(filtered_df)
输出:
name age city
0 john 42 London
1 mike 24 Tokyo
答案 1 :(得分:1)
创建两个列表,一个列表包含要包含的参数,另一个包含要排除的参数。并相应地修改蒙版
include = ["name", "city"]
exclude = ["age"]
def get_filtered_df(df, filters, include):
for filter_name, filter_value in filters.items():
mask = df[filter_name].isin(filter_value)
if filter_name not in include:
mask = ~mask
df = df[mask]
return df
df = pd.DataFrame(d)
filtered_df = get_filtered_df(df, filters)
print(filtered_df)
输出符合预期
name age city
0 john 42 London
1 mike 24 Tokyo