如何使用python将固定宽度的文本文件转换为csv?

时间:2012-04-02 16:38:42

标签: python csv text-files

我在线搜索,但找不到我的问题的答案。我需要将txt文件转换为csv。我已经弄清楚如何使用分隔符来执行此操作,但是,txt文件没有任何分隔符或标题,因此我必须设置固定宽度数字。该文件有数百万条记录。列的宽度为10,60,60,60,30,2,3,6,9,5,6,12,6,5,3,12,12,5,3。

我的两个挑战是: 1.使用上面列出的固定宽度将文件转换为csv。 2.插入标题。

数据看起来像这样:

0000000626ISOCKE BBBB ZZZZZ TW DARTMOUTH 10 FDSAF DR DARTMOUTH CASN 7H44DR SAAB -11.111111 22.2222222 000 -33.333333 44.4444444 000
0000000627ISOCKE FFFF TTTTT TW HALIFAX 3367 FDSAF RD HALIFAX CASN 8C5ASE SAAB -55.555555 66.6666666 000 -77.777777 88.8888888 000
0000000628ISOCKE RE CHARLOTTETOWN 449 UYRNT ECSARW RD CHARLOTTETOWN CAPE CSE8HR SAAB -99.999999 11.1111111 000 -22.222222 33.3333333 000

同样,我已经能够使用此代码将文件转换为csv,但格式不正确:

import csv
rf = open(r'C:\Users\...New Folder\practice.txt', 'r') #input file handle
wf = open(r'C:\Users\...New Folder\Book1.csv','w') #output file handle
writer = csv.writer(wf)

for row in rf.readlines():
    writer.writerow(row.split())
rf.close() # close input file handle
wf.close() # close output file handle

4 个答案:

答案 0 :(得分:2)

使用struct撕开每个固定宽度的行,在适当的位置进行修剪。

答案 1 :(得分:2)

使用切片对象:

>>> widths = 1,2,3
>>> slices = []
>>> offset = 0
>>> for w in widths:
...     slices.append(slice(offset, offset + w))
...     offset += w
...
>>> slices
[slice(0, 1, None), slice(1, 3, None), slice(3, 6, None)]
>>> pieces = ["abcdef"[slice] for slice in slices]
>>> pieces
['a', 'bc', 'def']
>>>

答案 2 :(得分:1)

如果还有人在寻找解决方案,我在python中开发了一个小脚本。它很容易使用,只要你有python 3.5

https://github.com/just10minutes/FixedWidthToDelimited/blob/master/FixedWidthToDelimiter.py

"""
This script will convert Fixed width File into Delimiter File, tried on Python 3.5 only
Sample run: (Order of argument doesnt matter)
python ConvertFixedToDelimiter.py -i SrcFile.txt -o TrgFile.txt -c Config.txt -d "|"
Inputs are as follows
1. Input FIle - Mandatory(Argument -i) - File which has fixed Width data in it
2. Config File - Optional (Argument -c, if not provided will look for Config.txt file on same path, if not present script will not run)
    Should have format as
    FieldName,fieldLength
    eg:
    FirstName,10
    SecondName,8
    Address,30
    etc:
3. Output File - Optional (Argument -o, if not provided will be used as InputFIleName plus Delimited.txt)
4. Delimiter - Optional (Argument -d, if not provided default value is "|" (pipe))
"""
from collections import OrderedDict
import argparse
from argparse import ArgumentParser
import os.path
import sys


def slices(s, args):
    position = 0
    for length in args:
        length = int(length)
        yield s[position:position + length]
        position += length

def extant_file(x):
    """
    'Type' for argparse - checks that file exists but does not open.
    """
    if not os.path.exists(x):
        # Argparse uses the ArgumentTypeError to give a rejection message like:
        # error: argument input: x does not exist
        raise argparse.ArgumentTypeError("{0} does not exist".format(x))
    return x





parser = ArgumentParser(description="Please provide your Inputs as -i InputFile -o OutPutFile -c ConfigFile")
parser.add_argument("-i", dest="InputFile", required=True,    help="Provide your Input file name here, if file is on different path than where this script resides then provide full path of the file", metavar="FILE", type=extant_file)
parser.add_argument("-o", dest="OutputFile", required=False,    help="Provide your Output file name here, if file is on different path than where this script resides then provide full path of the file", metavar="FILE")
parser.add_argument("-c", dest="ConfigFile", required=False,   help="Provide your Config file name here,File should have value as fieldName,fieldLength. if file is on different path than where this script resides then provide full path of the file", metavar="FILE",type=extant_file)
parser.add_argument("-d", dest="Delimiter", required=False,   help="Provide the delimiter string you want",metavar="STRING", default="|")

args = parser.parse_args()

#Input file madatory
InputFile = args.InputFile
#Delimiter by default "|"
DELIMITER = args.Delimiter

#Output file checks
if args.OutputFile is None:
    OutputFile = str(InputFile) + "Delimited.txt"
    print ("Setting Ouput file as "+ OutputFile)
else:
    OutputFile = args.OutputFile

#Config file check
if args.ConfigFile is None:
    if not os.path.exists("Config.txt"):
        print ("There is no Config File provided exiting the script")
        sys.exit()
    else:
        ConfigFile = "Config.txt"
        print ("Taking Config.txt file on this path as Default Config File")
else:
    ConfigFile = args.ConfigFile

fieldNames = []
fieldLength = []
myvars = OrderedDict()


with open(ConfigFile) as myfile:
    for line in myfile:
        name, var = line.partition(",")[::2]
        myvars[name.strip()] = int(var)
for key,value in myvars.items():
    fieldNames.append(key)
    fieldLength.append(value)

with open(OutputFile, 'w') as f1:
    fieldNames = DELIMITER.join(map(str, fieldNames))
    f1.write(fieldNames + "\n")
    with open(InputFile, 'r') as f:
        for line in f:
            rec = (list(slices(line, fieldLength)))
            myLine = DELIMITER.join(map(str, rec))
            f1.write(myLine + "\n")

答案 3 :(得分:0)

就我个人而言,FixedWidth模块运行得很好。

请参见https://pypi.org/project/FixedWidth/

需要一些设置,但是,您可以将字段描述为字符串,数字(具有指定精度的功能),左对齐,右对齐,每个字段的填充字符等。

功能非常强大,尤其是如果您需要分析注释而不是一种文件类型:您只需提供预期的字段描述,它便可以完成其他所有工作。