使用两个数据框进行条件计数

时间:2019-10-07 22:35:14

标签: python pandas dataframe

我有一个数据集,其中包含“网格块”(较小的地理单位,通常用于普查数据)和犯罪。记录下来。当前,数据的格式为:

DataFramerecords

Meshblock   Crime
1100        Trolling
1200        Not indenting code
1300        Trolling
1400        Trolling
1200        Not indenting code
1100        Trolling

我创建了一个新的DataFrame,该索引使用单独的网格块进行了索引,其列来自犯罪类别。

DataFramedf

Meshblock   trolling   not indenting code
1100
1200
1300
1400

以及各种犯罪类别的列表:

offences[trolling, not indenting code]

我现在想要做的是统计x网格块上n次犯罪的发生。

到目前为止,我当前拥有的代码是:

for off in offences:
    for col, row in df.iterrows():

        for col1, row1 in records.iterrows():

        #if the codes match and the offence is present for the match then we increment the count by 1
            if row['Meshblock'] == row1['Meshblock'] and row1['Crime'] == off:
              #something here that will iterate the count by 1 where there is a match

最后的DataFrame应该如下:

DataFramedf

Meshblock   trolling   not indenting code
1100            2
1200                          2
1300            1
1400            1

1 个答案:

答案 0 :(得分:1)

您应该可以使用pivot_table来更轻松地解决它:

import pandas as pd
import numpy as np
a = {'Meshblock':[1100,1200,1300,1400,1200,1100],'Crime':['Trolling','Not indenting code','Trolling','Trolling','Not indenting code','Trolling']}
df = pd.DataFrame(a)
df = df.pivot_table(columns='Crime',index='Meshblock',aggfunc=len)
print(df)

输出:

Crime      Not indenting code  Trolling
Meshblock
1100                      NaN       2.0
1200                      2.0       NaN
1300                      NaN       1.0
1400                      NaN       1.0