示例df:
company vehicle registration
0 company1 truck abc123
1 company1 truck abcdefg
2 company1 car 234cse
3 company1 forklift NaN
4 company1 truck 93ds2
5 company2 car rentall
6 company2 car rental2
7 company2 truck rentals
8 company2 truck rental*
9 company2 car rental5
10 company3 truck fdsa23
11 company3 truck asdf4
12 company3 other fdsag3
13 company3 other NaN
14 company3 truck gls319d
我的目标是按公司和车辆类型进行计数(注册和车辆列将被删除)。
我已经尝试过了:
import pandas as pd
df = pd.read_csv('path to csv', header=0)
df.loc[df.vehicle == 'truck', 'trucks'] = 1
df.loc[df.vehicle == 'car', 'cars'] = 1
df.loc[df.vehicle != 'truck', 'others'] = 1
df.loc[df.vehicle != 'cars', 'others'] = 1
从那里开始,我假设某种groupby和sum函数将合并行和列。
不幸的是,这仅在车辆列中填充了“ 1”值,而不是在相应列中具有这些值。
我想要的输出是:
company trucks cars others
company1 3 1 1
company2 2 3 0
company3 3 0 2
我敢肯定这可能已经得到回答,但是今天早上我的google-fu很弱。
干杯。
答案 0 :(得分:5)
首先将Series.map
用于字典中已过滤的类别,然后将所有不匹配的值(NaN)替换为Series.fillna
。
然后传递到crosstab
,如果输出列的顺序很重要,请添加DataFrame.reindex
:
{'pic1': {'filename': 'pic1.png',
'size': 545,
'regions': [{'shape_attributes': {'name': 'polygon',
'x_values': [211, 205, 214, 232, 254, 263, 265, 265, 263, 257, 221],
'y_values': [186, 200, 214, 218, 214, 204, 198, 190, 187, 181, 180]},
'type': {'animal': '1'}},
{'shape_attributes': {'name': 'polygon',
'x_values': [272, 266, 275, 293, 315, 324, 326, 326, 324, 318, 282],
'y_values': [233, 247, 261, 265, 261, 251, 245, 237, 234, 228, 227]},
'type': {'animal': '2'}},
{'shape_attributes': {'name': 'polygon',
'x_values': [366, 360, 369, 387, 409, 418, 420, 420, 418, 412, 376],
'y_values': [315, 329, 343, 347, 343, 333, 327, 319, 316, 310, 309]},
'type': {'animal': '2'}},
{'shape_attributes': {'name': 'polygon',
'x_values': [201, 195, 204, 222, 244, 253, 255, 255, 253, 247, 211],
'y_values': [224, 238, 252, 256, 252, 242, 236, 228, 225, 219, 218]},
'type': {'animal': '3'}}],
'file_attributes': {}},
'pic2': {'filename': 'pic2.png',
'size': 456,
'regions': [{'shape_attributes': {'name': 'polygon',
'x_values': [211, 205, 214, 232, 254, 263, 265, 265, 263, 257, 221],
'y_values': [186, 200, 214, 218, 214, 204, 198, 190, 187, 181, 180]},
'type': {'animal': '1'}},
{'shape_attributes': {'name': 'polygon',
'x_values': [272, 266, 275, 293, 315, 324, 326, 326, 324, 318, 282],
'y_values': [233, 247, 261, 265, 261, 251, 245, 237, 234, 228, 227]},
'type': {'animal': '2'}},
{'shape_attributes': {'name': 'polygon',
'x_values': [366, 360, 369, 387, 409, 418, 420, 420, 418, 412, 376],
'y_values': [315, 329, 343, 347, 343, 333, 327, 319, 316, 310, 309]},
'type': {'animal': '2'}},
{'shape_attributes': {'name': 'polygon',
'x_values': [201, 195, 204, 222, 244, 253, 255, 255, 253, 247, 211],
'y_values': [224, 238, 252, 256, 252, 242, 236, 228, 225, 219, 218]},
'type': {'animal': '3'}}],
'file_attributes': {}}}