使用不同大小的数据帧划分pandas中的列

时间:2016-02-22 13:55:26

标签: python pandas

我正面临一个小熊猫的挑战,我很难想出来。

我使用以下代码

创建了两个数据帧
df5 = dataFrame[['PdDistrict' , 'Category']]
df5 = df5[pd.notnull(df5['PdDistrict'])]
df5 = df5.groupby(['Category', 'PdDistrict']).size()
df5 = df5.reset_index()
df5 = df5.sort_values(['PdDistrict',0], ascending=False)

df6 = df5.groupby('PdDistrict')[0].sum()
df6 = df6.reset_index()

这给了我两个数据帧。 df5包含特定类别在特定区域中出现的次数。例如

'Category'   'PdDistrict'  'count'
   Drugs       Bayview       200
   Theft       Bayview       200
   Gambling    Bayview       200
   Drugs       CENTRAL       300
   Theft       CENTRAL       300
   Gambling    CENTRAL       300

df6帧包含给定PdDistrict的类别总数。

这为df6提供了以下外观

'PdDistrict' 'total count'
  Bayview        600
  CENTRAL        900

现在我想要的是df5看起来像这样:

'Category'   'PdDistrict'  'count'      'Average'
   Drugs       Bayview       200           0.33
   Theft       Bayview       200           0.33
   Gambling    Bayview       200           0.33
   Drugs       CENTRAL       200           0.22
   Theft       CENTRAL       200           0.22
   Gambling    CENTRAL       200           0.22

所以它基本上从df5计算并将其除以df6的totalcount,但是对于同一区域。我怎么能这样做?

res = df5.set_index('PdDistrict', append = False) / df6.set_index('PdDistrict', append = False)

上面给出了NaN的分类。

1 个答案:

答案 0 :(得分:2)

您可以将total count col添加到您的第一个df,然后您可以执行计算:

In [45]:
df['total count'] = df['PdDistrict'].map(df1.set_index('PdDistrict')['total count'])
df

Out[45]:
   Category PdDistrict  count  total count
0     Drugs    Bayview    200          600
1     Theft    Bayview    200          600
2  Gambling    Bayview    200          600
3     Drugs    CENTRAL    300          900
4     Theft    CENTRAL    300          900
5  Gambling    CENTRAL    300          900

In [46]:
df['Average'] = df['count']/df['total count']
df

Out[46]:
   Category PdDistrict  count  total count   Average
0     Drugs    Bayview    200          600  0.333333
1     Theft    Bayview    200          600  0.333333
2  Gambling    Bayview    200          600  0.333333
3     Drugs    CENTRAL    300          900  0.333333
4     Theft    CENTRAL    300          900  0.333333
5  Gambling    CENTRAL    300          900  0.333333