df1:
Id Country Product
1 india cotton
2 germany shoes
3 algeria bags
df2:
id Country Product Qty Sales
1 India cotton 25 635
2 India cotton 65 335
3 India cotton 96 455
4 India cotton 78 255
5 germany shoes 25 635
6 germany shoes 65 458
7 germany shoes 96 455
8 germany shoes 69 255
9 algeria bags 25 635
10 algeria bags 89 788
11 algeria bags 96 455
12 algeria bags 78 165
我需要根据df1的“国家/地区和产品”列过滤df2并创建新的数据框。 例如,在df1中,有3个唯一的国家/地区,类别,因此df的数量为3。
输出:
df_India_Cotton :
id Country Product Qty Sales
1 India cotton 25 635
2 India cotton 65 335
3 India cotton 96 455
4 India cotton 78 255
df_germany_Product:
id Country Product Qty Sales
1 germany shoes 25 635
2 germany shoes 65 458
3 germany shoes 96 455
4 germany shoes 69 255
df_algeria_Product:
id Country Product Qty Sales
1 algeria bags 25 635
2 algeria bags 89 788
3 algeria bags 96 455
4 algeria bags 78 165
我还可以使用熊猫的基本子集来过滤掉这些数据框。
df[(df.Country=='India') & (df.Products=='cotton')]
它将解决此问题,在df1中可能会有如此多的国家/地区,产品的独特组合。
答案 0 :(得分:1)
您可以创建字典并将所有数据框保存在其中。 检查以下代码:
d={}
for i in range(len(df1)):
name=df1.Country.iloc[i]+'_'+df1.Product.iloc[i]
d[name]=df2[(df2.Country==df1.Country.iloc[i]) & (df2.Product==df1.Product.iloc[i])]
您可以按如下所示的值调用每个数据框:
d ['India_cotton']将给出:
id Country Product Qty Sales
1 India cotton 25 635
2 India cotton 65 335
3 India cotton 96 455
4 India cotton 78 255
答案 1 :(得分:0)
尝试创建两个groupby。使用第一个从第二个中选择:
import pandas as pd
selector_df = pd.DataFrame(data=
{
'Country':'india germany algeria'.split(),
'Product':'cotton shoes bags'.split()
})
details_df = pd.DataFrame(data=
{
'Country':'india india india india germany germany germany germany algeria algeria algeria algeria'.split(),
'Product':'cotton cotton cotton cotton shoes shoes shoes shoes bags bags bags bags'.split(),
'qty':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
})
selectorgroups = selector_df.groupby(by=['Country', 'Product'])
datagroups = details_df.groupby(by=['Country', 'Product'])
for tag, group in selectorgroups:
print(tag)
try:
print(datagroups.get_group(tag))
except KeyError:
print('tag does not exist in datagroup')