在Python中提取和分组数据

时间:2015-11-15 15:31:55

标签: python dataframe grouping

我有一个我使用Pandas read_csv函数导入的CSV数据集,当我运行.head()时,我得到以下表格输出:

    LSOA code             Crime type
0   E01006687               Burglary
1   E01007229  Anti-social behaviour
2   E01007229  Anti-social behaviour
3   E01007229  Anti-social behaviour
4   E01007229               Burglary
5   E01007229            Other theft
6   E01007229            Other theft
7   E01007229            Shoplifting
8   E01007229  Theft from the person
9   E01007230  Anti-social behaviour
10  E01007230  Anti-social behaviour
11  E01007230  Anti-social behaviour
12  E01007230  Anti-social behaviour
13  E01007230  Anti-social behaviour
14  E01007230  Anti-social behaviour
15  E01007230  Anti-social behaviour
16  E01007230  Anti-social behaviour
17  E01007230  Anti-social behaviour
18  E01007230  Anti-social behaviour
19  E01007230  Anti-social behaviour

此表包含超过33,000行。我需要做的是获得LSOA代码的所有独特价值' - 其中有207个,然后对于每个LSOA代码',我需要一个值来表示每个犯罪类型的出现次数' ..其中约有30个,然后是每个LSOA代码的总犯罪总和

例如:我喜欢以下类型的输出表,其中' LSOA代码'是索引列:

LSOA code | Burglary | Anti-social Behavior | Bicycle Theft | Assault ... | Total

E01000067 | 32 | 21 | 8 | 43 ... | 1023

E01000043 | 98 | 65 | 5 | 73 ... | 2308

E01000237 | 38 | 34 | 12 | 92 ... | 897

E01000038 | 82 | 28 | 3 | 18 ... | 2147

我设法将LSOA代码放入数据框中,每个LSOA中的犯罪总数使用以下内容:

WirralCrimes = Crimes['LSOA code'].value_counts()
CrimeDF = pd.DataFrame(pd.Series(WirralCrimes))
CrimeDF.columns = ["Count"]

..但我无法弄清楚如何将每种犯罪类型列入一个专栏并总结每个LSOA的出现情况

有人能指出我应该做些什么吗?

非常感谢

1 个答案:

答案 0 :(得分:0)

如果你有类似的数据,这应该现在可以使用:

df = DataFrame({'LSOA code':['E01006687','E01007229','E01007229','E01007229','E01007229','E01007229','E01007229','E01007229','E01007230','E01007230']
, 'Crime type':['Burglary','Anti-social behaviour','Anti-social behaviour','Anti-social behaviour','Burglary','Other theft','Other theft','Shoplifting','Theft from the person','Anti-social behaviour']})


your_data['count'] = 1

table = pandas.pivot_table(your_data, index='LSOA code', columns='Crime type',values='count',aggfunc='sum')
table ["total"] = table.sum(axis=1)