我试图在python中将文本文件转换为excel表。 txt文件包含以下指定的formart
中的数据列名:reg no,邮政编码,loc id,emp id,lastname,名字。每条记录都有一个或多个错误号。每条记录的列名都列在值的上方。我想创建一个excel表,其中包含每个记录的单独行中列出的reg no,firstname,lastname和errors。
如何将记录放在Excel表格中?我应该使用正则表达式吗?如何在相应的记录的不同行中插入错误号?
预期产出:
以下是输入文件的链接: https://github.com/trEaSRE124/Text_Excel_python/blob/master/new.txt
非常感谢任何代码段或建议。
答案 0 :(得分:2)
这是一个草案代码。如果需要进行任何更改,请与我们联系:
# import pandas as pd
from collections import OrderedDict
from datetime import date
import csv
with open('in.txt') as f:
with open('out.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
#Remove inital clutter
while("INPUT DATA" not in f.readline()):
continue
header = ["REG NO", "ZIP CODE", "LOC ID", "EMP ID", "LASTNAME", "FIRSTNAME", "ERROR"]; data = list(); errors = list()
spamwriter.writerow(header)
print header
while(True):
line = f.readline()
errors = list()
if("END" in line):
exit()
try:
int(line.split()[0])
data = line.strip().split()
f.readline() # get rid of \n
line = f.readline()
while("ERROR" in line):
errors.append(line.strip())
line = f.readline()
spamwriter.writerow(data + errors)
spamwriter.flush()
except:
continue
# while(True):
# line = f.readline()
使用python-2运行。错误将作为后续列附加。它的方式稍微复杂一点。如果仍然需要我可以解决它
答案 1 :(得分:1)
您可以使用openpyxl库来执行此操作,该库能够将项目直接存入电子表格。此代码显示了如何针对您的特定情况执行此操作。
NEW_PERSON, ERROR_LINE = 1,2
def Line_items():
with open('katherine.txt') as katherine:
for line in katherine:
line = line.strip()
if not line:
continue
items = line.split()
if items[0].isnumeric():
yield NEW_PERSON, items
elif items[:2] == ['ERROR', 'NUM']:
yield ERROR_LINE, line
else:
continue
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
ws['A2'] = 'REG NO'
ws['B2'] = 'LASTNAME'
ws['C2'] = 'FIRSTNAME'
ws['D2'] = 'ERROR'
row = 2
for kind, data in Line_items():
if kind == NEW_PERSON:
row += 2
ws['A{:d}'.format(row)] = int(data[0])
ws['B{:d}'.format(row)] = data[-2]
ws['C{:d}'.format(row)] = data[-1]
first = True
else:
if first:
first = False
else:
row += 1
ws['D{:d}'.format(row)] = data
wb.save(filename='katherine.xlsx')
这是结果的屏幕快照。