使用重复标题从文件计算Groupby sum列

时间:2015-10-01 07:35:12

标签: python python-2.7 pandas sum dataframe

text file为65MB,包含超过270000行数据:

Table To Be Searched MSEG
Number of hits                                                            273208
Maximum No. of Entri                                                           0
Runtime                00:24:17

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

|Mat. Doc. |MatYr|MvT|Material |Plnt|SLoc|Batch     |Customer|  Amount in LC|        Amount|    Quantity|BUn|    Qty in UnE|EUn|PO        |MatYr|Mat. Doc. |Order    |Profit Ctr|SLED/BBD  |Pstng Date|Entry Date|Time    |User name  |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|5000793175|2015 |101|101567   |HMU1|0001|5H03MU1A23|        |     4,421.81 |     4,421.81 |         41 |CS |           41 |CS |          |     |          |1428058  |IN1165B010|31.01.2016|04.08.2015|04.08.2015|17:35:34|WF-BATCH   |
|5000793176|2015 |101|101567   |HMU1|0001|5H03MU1A23|        |     2,372.68 |     2,372.68 |         22 |CS |           22 |CS |          |     |          |1428058  |IN1165B010|31.01.2016|04.08.2015|04.08.2015|17:35:36|WF-BATCH   |
|5000793177|2015 |101|100633   |HMU1|0001|5H04MU1R12|        |     6,746.04 |     6,746.04 |         43 |CS |           43 |CS |          |     |          |1428207  |IN1165B010|03.08.2016|04.08.2015|04.08.2015|17:35:37|WF-BATCH   |
...
|5000793197|2015 |101|160      |HMA1|0004|          |        |         0.00 |         0.00 |      2,760 |EA |        2,760 |EA |4900085236|     |          |         |IN1165B030|          |04.08.2015|04.08.2015|17:21:49|A81808     |
|5000793197|2015 |101|161      |HMA1|0004|          |        |         0.00 |         0.00 |      1,680 |EA |        1,680 |EA |4900085236|     |          |         |IN1165B030|          |04.08.2015|04.08.2015|17:21:49|A81808     |
|5000793197|2015 |101|35       |HMA1|0004|          |        |         0.00 |         0.00 |      2,160 |EA |        2,160 |EA |4900085236|     |          |         |IN1165B030|          |04.08.2015|04.08.2015|17:21:49|A81808     |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Table To Be Searched MSEG
Number of hits                                                            273208
Maximum No. of Entri                                                           0
Runtime                00:24:17

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|Mat. Doc. |MatYr|MvT|Material |Plnt|SLoc|Batch     |Customer|  Amount in LC|        Amount|    Quantity|BUn|    Qty in UnE|EUn|PO        |MatYr|Mat. Doc. |Order    |Profit Ctr|SLED/BBD  |Pstng Date|Entry Date|Time    |User name  |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|5000793197|2015 |101|90       |HMA1|0004|          |        |         0.00 |         0.00 |      6,480 |EA |        6,480 |EA |4900085236|     |          |         |IN1165B030|          |04.08.2015|04.08.2015|17:21:49|A81808     |
|5000793197|2015 |101|149      |HMA1|0004|          |        |         0.00 |         0.00 |      1,080 |EA |        1,080 |EA |4900085236|     |          |         |IN1165B030|          |04.08.2015|04.08.2015|17:21:49|A81808     |
|5000793197|2015 |101|182      |HMA1|0004|          |        |         0.00 |         0.00 |        770 |EA |          770 |EA |4900085236|     |          |         |IN1165B030|          |04.08.2015|04.08.2015|17:21:49|A81808     |
...
|5000793244|2015 |101|101772   |HMS1|0001|5H04MS1P21|        |   174,281.34 |   174,281.34 |        631 |CS |          631 |CS |          |     |          |1428186  |IN1165B058|02.11.2015|04.08.2015|04.08.2015|18:25:05|WF-BATCH   |
|5000793245|2015 |101|20000052 |HMA1|0002|0000054359|        |    95,498.88 |    49,315.20 |      4,670 |KG |        4,670 |KG |4200000840|     |          |         |IN1165B030|27.07.2016|04.08.2015|04.08.2015|18:27:44|A60694     |
|5000793247|2015 |101|60000793 |HMA1|0002|0000054360|        |   559,879.08 |   516,786.17 |  3,887.800 |KG |    3,887.800 |KG |4200006170|     |          |         |IN1165B030|31.07.2016|05.08.2015|04.08.2015|18:37:15|A60694     |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
...

在上表中,我需要对列Amount in LC求和。但问题是它应该首先按MvT分组,而不是Order。 这样我的输出应该是这样的:

Mvt|Order|Sum
101|abc|1234
101|def|4321
102|qwe|0981

但我收到错误:

df = pd.read_csv('C:\Users\Administrator\Documents\GitHub\quadreader\Text.txt', sep='|')
...
CParserError: Error tokenizing data. C error: Expected 1 fields in line 7, saw 26

0 个答案:

没有答案