Txt文件在python中转换为excel

时间:2017-12-20 03:18:10

标签: python python-3.x python-2.7

我试图在python中将文本文件转换为excel表。 txt文件包含以下指定的formart

中的数据

Sample data

列名:reg no,邮政编码,loc id,emp id,lastname,名字。每条记录都有一个或多个错误号。每条记录的列名都列在值的上方。我想创建一个excel表,其中包含每个记录的单独行中列出的reg no,firstname,lastname和errors。

如何将记录放在Excel表格中?我应该使用正则表达式吗?如何在相应的记录的不同行中插入错误号?

预期产出:

enter image description here

以下是输入文件的链接: https://github.com/trEaSRE124/Text_Excel_python/blob/master/new.txt

非常感谢任何代码段或建议。

2 个答案:

答案 0 :(得分:2)

这是一个草案代码。如果需要进行任何更改,请与我们联系:

# import pandas as pd
from collections import OrderedDict
from datetime import date
import csv

with open('in.txt') as f:
    with open('out.csv', 'wb') as csvfile:
        spamwriter = csv.writer(csvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
        #Remove inital clutter
        while("INPUT DATA" not in f.readline()):
            continue 

        header = ["REG NO", "ZIP CODE", "LOC ID", "EMP ID", "LASTNAME", "FIRSTNAME", "ERROR"]; data = list(); errors = list()
        spamwriter.writerow(header)
        print header

        while(True):
            line = f.readline()
            errors = list()
            if("END" in line):
                exit()
            try:
                int(line.split()[0])
                data = line.strip().split()
                f.readline() # get rid of \n
                line = f.readline()
                while("ERROR" in line):
                    errors.append(line.strip())
                    line = f.readline()
                spamwriter.writerow(data + errors)
                spamwriter.flush() 
            except:
                continue


        # while(True):
            # line = f.readline()

使用python-2运行。错误将作为后续列附加。它的方式稍微复杂一点。如果仍然需要我可以解决它

输出如下: enter image description here

答案 1 :(得分:1)

您可以使用openpyxl库来执行此操作,该库能够将项目直接存入电子表格。此代码显示了如何针对您的特定情况执行此操作。

NEW_PERSON, ERROR_LINE = 1,2
def Line_items():
    with open('katherine.txt') as katherine:
        for line in katherine:
            line = line.strip()
            if not line:
                continue
            items = line.split()
            if items[0].isnumeric():
                yield NEW_PERSON, items
            elif items[:2] == ['ERROR', 'NUM']:
                yield ERROR_LINE, line
            else:
                continue

from openpyxl import Workbook
wb = Workbook()
ws = wb.active

ws['A2'] = 'REG NO'
ws['B2'] = 'LASTNAME'
ws['C2'] = 'FIRSTNAME'
ws['D2'] = 'ERROR'

row = 2
for kind, data in Line_items():
    if kind == NEW_PERSON:
        row += 2
        ws['A{:d}'.format(row)] = int(data[0])
        ws['B{:d}'.format(row)] = data[-2]
        ws['C{:d}'.format(row)] = data[-1]
        first = True
    else:
        if first:
            first = False
        else:
            row += 1
        ws['D{:d}'.format(row)] = data

wb.save(filename='katherine.xlsx')

这是结果的屏幕快照。

spreadsheet