我正在尝试打开此CSV文件,然后将数据解析为列。问题是数据输入的方式导致了我的问题。当我尝试运行python脚本时,我获得了每个句子中包含[[DATA DATA']]的所有数据。我想将数据解析为“ Account#”,“ Service Address”,“ City”等列。就像下面已经存在的列名一样。像我说的那样,该数据的结构方式很奇怪,因为它的列头上下都有。例如,列标题“ Account#”在下面有第二个列标题作为“费率代码”。不确定执行此操作的最佳方法,并希望从专家那里得到一些意见。
Python脚本
import csv
with open('C:/Users/DEMO/Documents/statement-9-28-18.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)
for line in csv_reader:
print(line)
结果
[' XYZ COMPANY DATE : 09/28/18 ']
[' PAGE : 1 ']
[' ELECTRIC BILL STATEMENT ']
[' ']
[' CUSTOMER NAME: XYZ CUSTOMER SUMMARY BILL NUMBER: 12345-67890 IF YOU HAVE ANY QUESTIONS, ']
[' CUSTOMER NUMBER: 1111111 PLEASE CONTACT: ']
[' MAILING ADDRESS: 4122 RICHARDSON ST ']
[' BILLING DATE: 09/28/18 SUMB@XYZ.COM45 ']
[' SANFORD FL 32771 PAST DUE DATE: 10/09/18 (305)333-3333 ']
[' ']
[' ']
[' READ SVC B MAXIMUM TOTAL DUE METER NO REMARKS ']
[' ACCOUNT # SERVICE ADDRESS CITY DATE DAY C KWH KWD AMOUNT ']
[' RATE CODE CY CUSTOMER NAME MAILING ADDRESS ']
[' ---------------------------------------------------------------------------------------------------------------------------------- ']
[' 11111-22222 485 JOHNSON AVE APT 1405 MIAMI 09/26/18 28 C 140 29.11 BAT0123 ']
[' RS-1 XYZ COMPANY 485 JOHNSON AVE ']
[' ']
[' 22222-33333 485 JOHNSON AVE APT 3541 MIAMI 09/26/18 28 C 130 28.08 BAT0123 ']
[' RS-1 XYZ COMPANY 485 JOHNSON AVE ']
[' ']
[' 33333-44444 485 JOHNSON AVE APT 4544 MIAMI 09/26/18 28 C 172 32.42 BAT0123 ']
[' RS-1 XYZ COMPANY 485 JOHNSON AVE ']
[' ']
[' 55555-66666 485 JOHNSON ST AVE APT 1111 MIAMI 09/26/18 28 C 243 39.81 BAT0123 ']
[' RS-1 XYZ COMPANY 485 JOHNSON AVE ']
答案 0 :(得分:0)
问题:我想将数据解析为列
注意:简单的
regex
也会在-
和/
上拆分。如果您根据需要扩展regex
,可以避免这种情况。
import re
rc = re.compile(r'(\w+)')
with open('C:/Users/DEMO/Documents/statement-9-28-18.csv', 'r') as itxt:
for n, line in enumerate(itxt.readline(), 1):
# Row 13 and 14 hold the Header
if n in [13, 14]:
findall = re.findall(rc, line)
print("{}".format(findall))
if n >= 16 and n%3 > 0:
findall = re.findall(rc, line)
print("{}".format(findall))
输出:
['ACCOUNT', 'SERVICE', 'ADDRESS', 'CITY', 'DATE', 'DAY', 'C', 'KWH', 'KWD', 'AMOUNT'] ['RATE', 'CODE', 'CY', 'CUSTOMER', 'NAME', 'MAILING', 'ADDRESS'] ['11111', '22222', '485', 'JOHNSON', 'AVE', 'APT', '1405', 'MIAMI', '09', '26', '18', '28', 'C', '140', '29', '11', 'BAT0123'] ['RS', '1', 'XYZ', 'COMPANY', '485', 'JOHNSON', 'AVE'] ['22222', '33333', '485', 'JOHNSON', 'AVE', 'APT', '3541', 'MIAMI', '09', '26', '18', '28', 'C', '130', '28', '08', 'BAT0123'] ['RS', '1', 'XYZ', 'COMPANY', '485', 'JOHNSON', 'AVE'] ['33333', '44444', '485', 'JOHNSON', 'AVE', 'APT', '4544', 'MIAMI', '09', '26', '18', '28', 'C', '172', '32', '42', 'BAT0123'] ['RS', '1', 'XYZ', 'COMPANY', '485', 'JOHNSON', 'AVE'] ['55555', '66666', '485', 'JOHNSON', 'ST', 'AVE', 'APT', '1111', 'MIAMI', '09', '26', '18', '28', 'C', '243', '39', '81', 'BAT0123'] ['RS', '1', 'XYZ', 'COMPANY', '485', 'JOHNSON', 'AVE']
使用Python测试:3.4.2