使用python将csv数据读取到与csv文件相同的列中

时间:2018-10-09 16:00:27

标签: python csv parsing

我正在尝试打开此CSV文件,然后将数据解析为列。问题是数据输入的方式导致了我的问题。当我尝试运行python脚本时,我获得了每个句子中包含[[DATA DATA']]的所有数据。我想将数据解析为“ Account#”,“ Service Address”,“ City”等列。就像下面已经存在的列名一样。像我说的那样,该数据的结构方式很奇怪,因为它的列头上下都有。例如,列标题“ Account#”在下面有第二个列标题作为“费率代码”。不确定执行此操作的最佳方法,并希望从专家那里得到一些意见。

Python脚本

 import csv

with open('C:/Users/DEMO/Documents/statement-9-28-18.csv', 'r') as  csv_file:
csv_reader = csv.reader(csv_file)

for line in csv_reader:
    print(line)

结果

['                                                  XYZ COMPANY                             DATE : 09/28/18         ']
['                                                                                                            PAGE :    1             ']
['                                                      ELECTRIC BILL STATEMENT                                                        ']
['                                                                                                                                    ']
['   CUSTOMER NAME:  XYZ CUSTOMER                            SUMMARY BILL NUMBER:  12345-67890        IF YOU HAVE ANY QUESTIONS,   ']
['                                                                  CUSTOMER NUMBER:      1111111        PLEASE CONTACT:              ']
[' MAILING ADDRESS:  4122 RICHARDSON ST                                                                                                 ']
['                                                                     BILLING DATE:     09/28/18        SUMB@XYZ.COM45               ']
['                   SANFORD             FL 32771                     PAST DUE DATE:     10/09/18        (305)333-3333                ']
['                                                                                                                                    ']
['                                                                                                                                    ']
['                                                                 READ   SVC B             MAXIMUM     TOTAL DUE  METER NO   REMARKS ']
['  ACCOUNT #  SERVICE ADDRESS                            CITY     DATE   DAY C    KWH        KWD        AMOUNT                       ']
['   RATE CODE CY CUSTOMER NAME                            MAILING ADDRESS                                                            ']
[' ---------------------------------------------------------------------------------------------------------------------------------- ']
[' 11111-22222 485 JOHNSON AVE APT 1405                MIAMI    09/26/18  28 C       140                   29.11   BAT0123           ']
['  RS-1       XYZ COMPANY                             485 JOHNSON AVE                                                           ']
['                                                                                                                                    ']
[' 22222-33333 485 JOHNSON AVE APT 3541                MIAMI    09/26/18  28 C       130                   28.08   BAT0123           ']
['  RS-1       XYZ COMPANY                             485 JOHNSON AVE                                                           ']
['                                                                                                                                    ']
[' 33333-44444 485 JOHNSON AVE APT 4544                 MIAMI    09/26/18  28 C       172                   32.42   BAT0123           ']
['  RS-1       XYZ COMPANY                              485 JOHNSON AVE                                                           ']
['                                                                                                                                    ']
[' 55555-66666 485 JOHNSON ST AVE APT 1111                MIAMI    09/26/18  28 C       243                   39.81   BAT0123           ']
['  RS-1       XYZ COMPANY                              485 JOHNSON AVE                                                           ']

1 个答案:

答案 0 :(得分:0)

  

问题:我想将数据解析为列


  

注意:简单的regex也会在-/上拆分。如果您根据需要扩展regex,可以避免这种情况。

import re

rc = re.compile(r'(\w+)')

with open('C:/Users/DEMO/Documents/statement-9-28-18.csv', 'r') as  itxt:
    for n, line in enumerate(itxt.readline(), 1):
        # Row 13 and 14 hold the Header
        if n in [13, 14]:
            findall = re.findall(rc, line)
            print("{}".format(findall))

        if n >= 16 and n%3 > 0:
            findall = re.findall(rc, line)
            print("{}".format(findall))
  

输出

['ACCOUNT', 'SERVICE', 'ADDRESS', 'CITY', 'DATE', 'DAY', 'C', 'KWH', 'KWD', 'AMOUNT']
['RATE', 'CODE', 'CY', 'CUSTOMER', 'NAME', 'MAILING', 'ADDRESS']
['11111', '22222', '485', 'JOHNSON', 'AVE', 'APT', '1405', 'MIAMI', '09', '26', '18', '28', 'C', '140', '29', '11', 'BAT0123']
['RS', '1', 'XYZ', 'COMPANY', '485', 'JOHNSON', 'AVE']
['22222', '33333', '485', 'JOHNSON', 'AVE', 'APT', '3541', 'MIAMI', '09', '26', '18', '28', 'C', '130', '28', '08', 'BAT0123']
['RS', '1', 'XYZ', 'COMPANY', '485', 'JOHNSON', 'AVE']
['33333', '44444', '485', 'JOHNSON', 'AVE', 'APT', '4544', 'MIAMI', '09', '26', '18', '28', 'C', '172', '32', '42', 'BAT0123']
['RS', '1', 'XYZ', 'COMPANY', '485', 'JOHNSON', 'AVE']
['55555', '66666', '485', 'JOHNSON', 'ST', 'AVE', 'APT', '1111', 'MIAMI', '09', '26', '18', '28', 'C', '243', '39', '81', 'BAT0123']
['RS', '1', 'XYZ', 'COMPANY', '485', 'JOHNSON', 'AVE']

使用Python测试:3.4.2