我正在尝试编写获取df和将列映射到值的字典的函数。该函数对行(索引)进行切片,使其仅返回其值与“条件”键值匹配的行。
例如:
df_isr13 = filterby_criteria(df, {"Area":["USA"], "Year":[2013]})
输出中仅包含带有"Year"=2013
和"Area"="USA"
的行。
我尝试过:
def filterby_criteria(df, criteria):
for key, values in criteria.items():
return df[df[key].isin(values)]
但是我只有第一个条件
如何获得pd.Dataframe.isin()
以外的所有条件以外的新数据框?
答案 0 :(得分:2)
您可以使用for循环并通过pandas合并功能添加每个条件:
def filterby_criteria(df, criteria):
for key, values in criteria.items():
df = pd.merge(df[df [key].isin(values)], df, how='inner')
return df
答案 1 :(得分:1)
考虑两个数据帧的简单合并,因为默认情况下merge
使用所有匹配的名称:
from itertools import product
import pandas as pd
def filterby_criteria(df, criteria):
# EXTRACT DICT ITEMS
k,v = criteria.keys(), criteria.values()
# BUILD DF OF ALL POSSIBLE MATCHES
all_matches = (pd.DataFrame(product(*v))
.set_axis(list(k), axis='columns', inplace=False)
)
# RETURN MERGED DF
return df.merge(all_matches)
使用随机种子数据进行演示:
数据
import numpy as np
import pandas as pd
np.random.seed(61219)
tools = ['sas', 'stata', 'spss', 'python', 'r', 'julia']
years = list(range(2013, 2019))
random_df = pd.DataFrame({'Tool': np.random.choice(tools, 500),
'Int': np.random.randint(1, 10, 500),
'Num': np.random.uniform(1, 100, 500),
'Year': np.random.choice(years, 500)
})
print(random_df.head(10))
# Tool Int Num Year
# 0 spss 4 96.465327 2016
# 1 sas 7 23.455771 2016
# 2 r 5 87.349825 2014
# 3 julia 4 18.214028 2017
# 4 julia 7 17.977237 2016
# 5 stata 3 41.196579 2013
# 6 stata 8 84.943676 2014
# 7 python 4 60.576030 2017
# 8 spss 4 47.024075 2018
# 9 stata 3 87.271072 2017
函数调用
criteria = {"Tool":["python", "r"], "Year":[2013, 2015]}
def filterby_criteria(df, criteria):
k,v = criteria.keys(), criteria.values()
all_matches = (pd.DataFrame(product(*v))
.set_axis(list(k), axis='columns', inplace=False)
)
return df.merge(all_matches)
final_df = filterby_criteria(random_df, criteria)
输出
print(final_df)
# Tool Int Num Year
# 0 python 8 96.611384 2015
# 1 python 7 66.782828 2015
# 2 python 9 73.638629 2015
# 3 python 4 70.763264 2015
# 4 python 2 28.311917 2015
# 5 python 3 69.888967 2015
# 6 python 8 97.609694 2015
# 7 python 3 59.198276 2015
# 8 python 3 64.497017 2015
# 9 python 8 87.672138 2015
# 10 python 9 33.605467 2015
# 11 python 8 25.225665 2015
# 12 r 3 72.202364 2013
# 13 r 1 62.192478 2013
# 14 r 7 39.264766 2013
# 15 r 3 14.599786 2013
# 16 r 4 22.963723 2013
# 17 r 1 97.647922 2013
# 18 r 5 60.457344 2013
# 19 r 5 15.711207 2013
# 20 r 7 80.273330 2013
# 21 r 7 74.190107 2013
# 22 r 7 37.923396 2013
# 23 r 2 91.970678 2013
# 24 r 4 31.489810 2013
# 25 r 1 37.580665 2013
# 26 r 2 9.686955 2013
# 27 r 6 56.238919 2013
# 28 r 6 72.820625 2015
# 29 r 3 61.255351 2015
# 30 r 4 45.690621 2015
# 31 r 5 71.143601 2015
# 32 r 6 54.744846 2015
# 33 r 1 68.171978 2015
# 34 r 5 8.521637 2015
# 35 r 7 87.027681 2015
# 36 r 3 93.614377 2015
# 37 r 7 37.918881 2015
# 38 r 3 7.715963 2015
# 39 python 1 42.681928 2013
# 40 python 6 57.354726 2013
# 41 python 1 48.189897 2013
# 42 python 4 12.201131 2013
# 43 python 9 1.078999 2013
# 44 python 9 75.615457 2013
# 45 python 8 12.631277 2013
# 46 python 9 82.227578 2013
# 47 python 7 97.802213 2013
# 48 python 1 57.103964 2013
# 49 python 1 1.941839 2013
# 50 python 3 81.981437 2013
# 51 python 1 56.869551 2013
PyFiddle Demo (单击顶部的运行)