我有一个单列数据帧(Di_G)的字典。我想使用Di_G中每个数据帧的索引从另一个字典(Di_A)查找值。然后,我想将Di_G中每个单列数据帧的每个值除以Di_A中的相应值。
import pandas as pd
# Data
df_1 = pd.DataFrame({'Box' : [1006,1006,1006,1006,1006,1006,1007,1007,1007,1007,1008,1008,1008,1009,1009,1010,1011,1011,1012,1013],
'Item': [ 40, 41, 42, 43, 44, 45, 40, 43, 44, 45, 43, 44, 45, 40, 41, 40, 44, 45, 44, 45]})
df_2 = pd.DataFrame({'Box' : [1006,1007,1008,1009,1010,1011,1012,1013,1014],
'Type': [ 103, 101, 102, 102, 102, 103, 103, 103, 103]})
# Join
df_J = df_1 .set_index('Box') .join(df_2 .set_index('Box', 'outer'))
# Count how many Boxes contain each Item - Count Boxes ( Item )
df_G = df_J.groupby('Item').size()
Di_A = df_G.to_dict()
# Group the Boxes by Type
Ma_G = df_J .groupby('Type')
Di_1 = {}
for name, group in Ma_G:
Di_1[str(name)] = group
# Count how many Boxes of each Type contain each Item - Count Boxes ( Item │ Type )
Di_G = {}
for k in Di_1:
Di_G[k] = Di_1[k].groupby('Item').size()
我试图这样做:
# Pr ( Type │ Item ) = Count Boxes ( Item │ Type ) / Count Boxes ( Item )
for k in Di_G:
Di_G[k]['Pr'] = Di_G[k]['0'] / Di_G[k]['Index'].map(Di_A)
我收到了“ KeyError:'0'”。
我尝试更改Di_G和Di_A中的列名,但是这样做很难。
答案 0 :(得分:1)
我认为您只需要transform
df_J.groupby(['Item','Type']).Item.transform('count')/df_J.groupby('Item').Item.transform('count')
Out[298]:
Box
1006 0.250000
1006 0.500000
1006 1.000000
1006 0.333333
1006 0.600000
1006 0.600000
1007 0.250000
1007 0.333333
1007 0.200000
1007 0.200000
1008 0.333333
1008 0.200000
1008 0.200000
1009 0.500000
1009 0.500000
1010 0.500000
1011 0.600000
1011 0.600000
1012 0.600000
1013 0.600000
Name: Item, dtype: float64
也可以更好地匹配您的预期输出
G=df_J.groupby(['Item','Type']).size()
G.div(G.sum(level=0),level=0)
Out[303]:
Item Type
40 101 0.250000
102 0.500000
103 0.250000
41 102 0.500000
103 0.500000
42 103 1.000000
43 101 0.333333
102 0.333333
103 0.333333
44 101 0.200000
102 0.200000
103 0.600000
45 101 0.200000
102 0.200000
103 0.600000
dtype: float64