将报告格式转换为数据集Python

时间:2019-11-26 13:28:06

标签: python pandas

我正在尝试将报告输出转换为数据集以进行分析。我无权访问从中提取报告的数据库,因此我需要使用Python进行此转换。

下面是所需输入数据集和最终数据集的示例。

该报告按月进行。我该如何在月份中增加一列?

import pandas as pd
data_input = [['2015. Aug'], 
        ['VESSEL', 'ARR', 'DEP', 'CARGO', 'QTY'], 
        ['C.DIGNITY', '01ST JUL', '02ND JUL', 'QATAR LAND', '1 MB'],
        ['MARNA CENTAURUS', '06TH AUG', '07TH AUG', 'BASRAH HEAVY CRUDE OIL', '1 MB'],
        ['C.MIGHTY', '05TH AUG', '06TH AUG', 'ARABIAN MEDIUM,ARABIAN HEAVY,ARABIAN LIGHT', '1.5 MB'],
        ['PAVEL CHERNYSH', '07TH AUG', '08TH AUG', 'SOKOL CRUDE OIL', '790 KB'],
        ['2015. Sep'], 
        ['VESSEL', 'ARR', 'DEP', 'CARGO', 'QTY'],
        ['C.EMPEROR', '01ST SEP', '03RD SEP', 'ARABIAN HEAVY,ARABIAN LIGHT', '1.53 MB'],
        ['DIONA', '03RD SEP', '05TH SEP', 'FOROZAN CRUDE OIL', '2 MB'],
        ['C.FREEDOM', '11TH SEP', '13TH SEP', 'KUWAIT CRUDE OIL,MURBAN CRUDE OIL', '1.27 MB'],
        ['IDEMITSU MARU', '13TH SEP', '15TH SEP', 'QATAR LAND CRUDE,QATAR MARINE CRUDE OIL,MURBAN CRUDE OIL', '2 MB']]
df_input = pd.DataFrame(data_input)

data_final = [['C.DIGNITY', '01ST JUL', '02ND JUL', 'QATAR LAND', '1 MB', '2015. Aug'],
        ['MARNA CENTAURUS', '06TH AUG', '07TH AUG', 'BASRAH HEAVY CRUDE OIL', '1 MB', '2015. Aug'],
        ['C.MIGHTY', '05TH AUG', '06TH AUG', 'ARABIAN MEDIUM,ARABIAN HEAVY,ARABIAN LIGHT', '1.5 MB', '2015. Aug'],
        ['PAVEL CHERNYSH', '07TH AUG', '08TH AUG', 'SOKOL CRUDE OIL', '790 KB', '2015. Aug'],
        ['C.EMPEROR', '01ST SEP', '03RD SEP', 'ARABIAN HEAVY,ARABIAN LIGHT', '1.53 MB', '2015. Sep'],
        ['DIONA', '03RD SEP', '05TH SEP', 'FOROZAN CRUDE OIL', '2 MB', '2015. Sep'],
        ['C.FREEDOM', '11TH SEP', '13TH SEP', 'KUWAIT CRUDE OIL,MURBAN CRUDE OIL', '1.27 MB', '2015. Sep'],
        ['IDEMITSU MARU', '13TH SEP', '15TH SEP', 'QATAR LAND CRUDE,QATAR MARINE CRUDE OIL,MURBAN CRUDE OIL', '2 MB', '2015. Sep']]

df_final = pd.DataFrame(data_final , columns = ['VESSEL', 'ARR', 'DEP', 'CARGO', 'QTY', 'REP_MONTH'])

1 个答案:

答案 0 :(得分:1)

您首先需要规范化数据。做:

header = ['VESSEL', 'ARR', 'DEP', 'CARGO', 'QTY']
data_final = []
for row in data_input:
    if row == header: #If this is row in the data just contains the header, it's not needed
        continue
    if len(row) == 1: #If this row in the data has 1 item, than it's the month for the next rows
        month = row[0]
        continue
    data_final.append(row + [month]) #Add the last month founded to the end of the row
df_final = pd.DataFrame(data_final, columns=header + ['REP_MONTH'])