我有一个常规的传入CSV,看起来像这样(简化):
Published Station TypeFuel Price
1/09/2015 BP Seaford ULP 129.9
1/09/2015 BP Seaford Diesel 133.9
1/09/2015 BP Seaford Gas 156.9
1/09/2015 Shell Newhaven ULP 139.9
1/09/2015 Shell Newhaven Diesel 150.9
1/09/2015 7-Eleven Malaga ULP 135.9
1/09/2015 7-Eleven Malaga Diesel 155.9
2/10/2015 BP Seaford ULP 138.9
2/10/2015 BP Seaford Diesel 133.6
2/10/2015 BP Seaford Gas 157.9
......隐藏了更多行。查看大约200个站点,每天报告20-30天。
我需要总结一下,看起来像这样:
Published Station ULP Diesel Gas
1/09/2015 BP Seaford 129.9 133.9 156.9
1/09/2015 Shell Newhaven 139.9 150.9
1/09/2015 7-Eleven Malaga 135.9 155.9
2/09/2015 BP Seaford 138.9 133.6 157.9
只是在Pandas教程中采取了一些步骤,也是Python的新手,但我相信这两个应该可以帮助我完成这项任务。
我相信我需要遍历CSV,当发布和站点匹配时,创建一个新行,将ULP /柴油/天然气价格转换为新列。
答案 0 :(得分:6)
您正在寻找DataFrame.pivot_table()
,根据列进行转化 - 'Published','Station'
,从列 - TypeFuel
获取值,用于数据透视表中的新列,并使用{{1作为它的价值观。示例 -
Price
如果您不希望In [5]: df
Out[5]:
Published Station TypeFuel Price
0 1/09/2015 BP Seaford ULP 129.9
1 1/09/2015 BP Seaford Diesel 133.9
2 1/09/2015 BP Seaford Gas 156.9
3 1/09/2015 Shell Newhaven ULP 139.9
4 1/09/2015 Shell Newhaven Diesel 150.9
5 1/09/2015 7-Eleven Malaga ULP 135.9
6 1/09/2015 7-Eleven Malaga Diesel 155.9
7 2/10/2015 BP Seaford ULP 138.9
8 2/10/2015 BP Seaford Diesel 133.6
9 2/10/2015 BP Seaford Gas 157.9
In [7]: df.pivot_table(index=['Published','Station'],columns=['TypeFuel'],values='Price')
Out[7]:
TypeFuel Diesel Gas ULP
Published Station
1/09/2015 7-Eleven Malaga 155.9 NaN 135.9
BP Seaford 133.9 156.9 129.9
Shell Newhaven 150.9 NaN 139.9
2/10/2015 BP Seaford 133.6 157.9 138.9
和Published
成为索引,则可以在Station
的结果上调用.reset_index()
来重置索引。示例 -
pivot_table()