检测重复项并创建汇总行

时间:2015-09-17 04:27:27

标签: python pandas

我有一个常规的传入CSV,看起来像这样(简化):

Published   Station         TypeFuel    Price
1/09/2015   BP Seaford      ULP         129.9
1/09/2015   BP Seaford      Diesel      133.9
1/09/2015   BP Seaford      Gas         156.9
1/09/2015   Shell Newhaven  ULP         139.9
1/09/2015   Shell Newhaven  Diesel      150.9
1/09/2015   7-Eleven Malaga ULP         135.9
1/09/2015   7-Eleven Malaga Diesel      155.9
2/10/2015   BP Seaford      ULP         138.9
2/10/2015   BP Seaford      Diesel      133.6
2/10/2015   BP Seaford      Gas         157.9

......隐藏了更多行。查看大约200个站点,每天报告20-30天。

我需要总结一下,看起来像这样:

Published   Station         ULP     Diesel  Gas
1/09/2015   BP Seaford      129.9   133.9   156.9
1/09/2015   Shell Newhaven  139.9   150.9   
1/09/2015   7-Eleven Malaga 135.9   155.9   
2/09/2015   BP Seaford      138.9   133.6   157.9

只是在Pandas教程中采取了一些步骤,也是Python的新手,但我相信这两个应该可以帮助我完成这项任务。

我相信我需要遍历CSV,当发布和站点匹配时,创建一个新行,将ULP /柴油/天然气价格转换为新列。

1 个答案:

答案 0 :(得分:6)

您正在寻找DataFrame.pivot_table(),根据列进行转化 - 'Published','Station',从列 - TypeFuel获取值,用于数据透视表中的新列,并使用{{1作为它的价值观。示例 -

Price

如果您不希望In [5]: df Out[5]: Published Station TypeFuel Price 0 1/09/2015 BP Seaford ULP 129.9 1 1/09/2015 BP Seaford Diesel 133.9 2 1/09/2015 BP Seaford Gas 156.9 3 1/09/2015 Shell Newhaven ULP 139.9 4 1/09/2015 Shell Newhaven Diesel 150.9 5 1/09/2015 7-Eleven Malaga ULP 135.9 6 1/09/2015 7-Eleven Malaga Diesel 155.9 7 2/10/2015 BP Seaford ULP 138.9 8 2/10/2015 BP Seaford Diesel 133.6 9 2/10/2015 BP Seaford Gas 157.9 In [7]: df.pivot_table(index=['Published','Station'],columns=['TypeFuel'],values='Price') Out[7]: TypeFuel Diesel Gas ULP Published Station 1/09/2015 7-Eleven Malaga 155.9 NaN 135.9 BP Seaford 133.9 156.9 129.9 Shell Newhaven 150.9 NaN 139.9 2/10/2015 BP Seaford 133.6 157.9 138.9 Published成为索引,则可以在Station的结果上调用.reset_index()来重置索引。示例 -

pivot_table()