根据其他数据框列值过滤熊猫数据框

时间:2020-09-22 18:46:28

标签: python pandas dataframe

df1:

Id   Country  Product
1    india    cotton
2    germany  shoes
3    algeria  bags

df2:

id   Country  Product  Qty   Sales
1    India    cotton   25    635
2    India    cotton   65    335
3    India    cotton   96    455
4    India    cotton   78    255
5    germany  shoes    25    635
6    germany  shoes    65    458
7    germany  shoes    96    455
8    germany  shoes    69    255
9    algeria  bags     25    635
10   algeria  bags     89    788
11   algeria  bags     96    455
12   algeria  bags     78    165

我需要根据df1的“国家/地区和产品”列过滤df2并创建新的数据框。 例如,在df1中,有3个唯一的国家/地区,类别,因此df的数量为3。

输出:

df_India_Cotton :

id   Country  Product  Qty   Sales
1    India    cotton   25    635
2    India    cotton   65    335
3    India    cotton   96    455
4    India    cotton   78    255

df_germany_Product:

id   Country  Product  Qty   Sales
1    germany  shoes    25    635
2    germany  shoes    65    458
3    germany  shoes    96    455
4    germany  shoes    69    255

df_algeria_Product:

id  Country  Product  Qty   Sales
1   algeria  bags     25    635
2   algeria  bags     89    788
3   algeria  bags     96    455
4   algeria  bags     78    165

我还可以使用熊猫的基本子集来过滤掉这些数据框。

df[(df.Country=='India') & (df.Products=='cotton')]

它将解决此问题,在df1中可能会有如此多的国家/地区,产品的独特组合。

2 个答案:

答案 0 :(得分:1)

您可以创建字典并将所有数据框保存在其中。 检查以下代码:

d={}
for i in range(len(df1)):
    name=df1.Country.iloc[i]+'_'+df1.Product.iloc[i]
    d[name]=df2[(df2.Country==df1.Country.iloc[i]) & (df2.Product==df1.Product.iloc[i])]

您可以按如下所示的值调用每个数据框:

d ['India_cotton']将给出:

id   Country  Product  Qty   Sales
1    India    cotton   25    635
2    India    cotton   65    335
3    India    cotton   96    455
4    India    cotton   78    255

答案 1 :(得分:0)

尝试创建两个groupby。使用第一个从第二个中选择:

import pandas as pd

selector_df = pd.DataFrame(data=
                           {
                               'Country':'india germany algeria'.split(),
                               'Product':'cotton shoes bags'.split()
                           })

details_df = pd.DataFrame(data=
                         {
                            'Country':'india india india india germany germany germany germany algeria algeria algeria algeria'.split(),
                            'Product':'cotton cotton cotton cotton shoes shoes shoes shoes bags bags bags bags'.split(),
                            'qty':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
                         })

selectorgroups = selector_df.groupby(by=['Country', 'Product'])
datagroups = details_df.groupby(by=['Country', 'Product'])
for tag, group in selectorgroups:
    print(tag)
    try:
        print(datagroups.get_group(tag))
    except KeyError:
        print('tag does not exist in datagroup')