我有一个数据集,其中包含“网格块”(较小的地理单位,通常用于普查数据)和犯罪。记录下来。当前,数据的格式为:
DataFrame
:records
Meshblock Crime
1100 Trolling
1200 Not indenting code
1300 Trolling
1400 Trolling
1200 Not indenting code
1100 Trolling
我创建了一个新的DataFrame
,该索引使用单独的网格块进行了索引,其列来自犯罪类别。
DataFrame
:df
Meshblock trolling not indenting code
1100
1200
1300
1400
以及各种犯罪类别的列表:
offences[trolling, not indenting code]
我现在想要做的是统计x网格块上n次犯罪的发生。
到目前为止,我当前拥有的代码是:
for off in offences:
for col, row in df.iterrows():
for col1, row1 in records.iterrows():
#if the codes match and the offence is present for the match then we increment the count by 1
if row['Meshblock'] == row1['Meshblock'] and row1['Crime'] == off:
#something here that will iterate the count by 1 where there is a match
最后的DataFrame
应该如下:
DataFrame
:df
Meshblock trolling not indenting code
1100 2
1200 2
1300 1
1400 1
答案 0 :(得分:1)
您应该可以使用pivot_table
来更轻松地解决它:
import pandas as pd
import numpy as np
a = {'Meshblock':[1100,1200,1300,1400,1200,1100],'Crime':['Trolling','Not indenting code','Trolling','Trolling','Not indenting code','Trolling']}
df = pd.DataFrame(a)
df = df.pivot_table(columns='Crime',index='Meshblock',aggfunc=len)
print(df)
输出:
Crime Not indenting code Trolling
Meshblock
1100 NaN 2.0
1200 2.0 NaN
1300 NaN 1.0
1400 NaN 1.0