通过pd.Dataframe.isin()通过多个条件获取新数据框

时间:2019-06-12 19:25:36

标签: python pandas dataframe

我正在尝试编写获取df和将列映射到值的字典的函数。该函数对行(索引)进行切片,使其仅返回其值与“条件”键值匹配的行。 例如: df_isr13 = filterby_criteria(df, {"Area":["USA"], "Year":[2013]})输出中仅包含带有"Year"=2013"Area"="USA"的行。

我尝试过:

def filterby_criteria(df, criteria):
    for key, values in criteria.items():
        return df[df[key].isin(values)]

但是我只有第一个条件 如何获得pd.Dataframe.isin()以外的所有条件以外的新数据框?

2 个答案:

答案 0 :(得分:2)

您可以使用for循环并通过pandas合并功能添加每个条件:

def filterby_criteria(df, criteria):
    for key, values in criteria.items():
        df = pd.merge(df[df [key].isin(values)], df, how='inner')
    return df

答案 1 :(得分:1)

考虑两个数据帧的简单合并,因为默认情况下merge使用所有匹配的名称:

from itertools import product
import pandas as pd

def filterby_criteria(df, criteria):
    # EXTRACT DICT ITEMS
    k,v = criteria.keys(), criteria.values()
    # BUILD DF OF ALL POSSIBLE MATCHES
    all_matches = (pd.DataFrame(product(*v))
                     .set_axis(list(k), axis='columns', inplace=False)
                  )
    # RETURN MERGED DF
    return df.merge(all_matches)

使用随机种子数据进行演示:

数据

import numpy as np
import pandas as pd

np.random.seed(61219)

tools = ['sas', 'stata', 'spss', 'python', 'r', 'julia']
years = list(range(2013, 2019))
random_df = pd.DataFrame({'Tool': np.random.choice(tools, 500),
                          'Int': np.random.randint(1, 10, 500),
                          'Num': np.random.uniform(1, 100, 500),
                          'Year': np.random.choice(years, 500)
                          })

print(random_df.head(10))
#      Tool  Int        Num  Year
# 0    spss    4  96.465327  2016
# 1     sas    7  23.455771  2016
# 2       r    5  87.349825  2014
# 3   julia    4  18.214028  2017
# 4   julia    7  17.977237  2016
# 5   stata    3  41.196579  2013
# 6   stata    8  84.943676  2014
# 7  python    4  60.576030  2017
# 8    spss    4  47.024075  2018
# 9   stata    3  87.271072  2017

函数调用

criteria = {"Tool":["python", "r"], "Year":[2013, 2015]}

def filterby_criteria(df, criteria):
    k,v = criteria.keys(), criteria.values()
    all_matches = (pd.DataFrame(product(*v))
                     .set_axis(list(k), axis='columns', inplace=False)
                  )        
    return df.merge(all_matches)    

final_df = filterby_criteria(random_df, criteria)

输出

print(final_df)
#       Tool  Int        Num  Year
# 0   python    8  96.611384  2015
# 1   python    7  66.782828  2015
# 2   python    9  73.638629  2015
# 3   python    4  70.763264  2015
# 4   python    2  28.311917  2015
# 5   python    3  69.888967  2015
# 6   python    8  97.609694  2015
# 7   python    3  59.198276  2015
# 8   python    3  64.497017  2015
# 9   python    8  87.672138  2015
# 10  python    9  33.605467  2015
# 11  python    8  25.225665  2015
# 12       r    3  72.202364  2013
# 13       r    1  62.192478  2013
# 14       r    7  39.264766  2013
# 15       r    3  14.599786  2013
# 16       r    4  22.963723  2013
# 17       r    1  97.647922  2013
# 18       r    5  60.457344  2013
# 19       r    5  15.711207  2013
# 20       r    7  80.273330  2013
# 21       r    7  74.190107  2013
# 22       r    7  37.923396  2013
# 23       r    2  91.970678  2013
# 24       r    4  31.489810  2013
# 25       r    1  37.580665  2013
# 26       r    2   9.686955  2013
# 27       r    6  56.238919  2013
# 28       r    6  72.820625  2015
# 29       r    3  61.255351  2015
# 30       r    4  45.690621  2015
# 31       r    5  71.143601  2015
# 32       r    6  54.744846  2015
# 33       r    1  68.171978  2015
# 34       r    5   8.521637  2015
# 35       r    7  87.027681  2015
# 36       r    3  93.614377  2015
# 37       r    7  37.918881  2015
# 38       r    3   7.715963  2015
# 39  python    1  42.681928  2013
# 40  python    6  57.354726  2013
# 41  python    1  48.189897  2013
# 42  python    4  12.201131  2013
# 43  python    9   1.078999  2013
# 44  python    9  75.615457  2013
# 45  python    8  12.631277  2013
# 46  python    9  82.227578  2013
# 47  python    7  97.802213  2013
# 48  python    1  57.103964  2013
# 49  python    1   1.941839  2013
# 50  python    3  81.981437  2013
# 51  python    1  56.869551  2013

PyFiddle Demo (单击顶部的运行)