我在线搜索,但找不到我的问题的答案。我需要将txt文件转换为csv。我已经弄清楚如何使用分隔符来执行此操作,但是,txt文件没有任何分隔符或标题,因此我必须设置固定宽度数字。该文件有数百万条记录。列的宽度为10,60,60,60,30,2,3,6,9,5,6,12,6,5,3,12,12,5,3。
我的两个挑战是: 1.使用上面列出的固定宽度将文件转换为csv。 2.插入标题。
数据看起来像这样:
0000000626ISOCKE BBBB ZZZZZ TW DARTMOUTH 10 FDSAF DR DARTMOUTH CASN 7H44DR SAAB -11.111111 22.2222222 000 -33.333333 44.4444444 000 0000000627ISOCKE FFFF TTTTT TW HALIFAX 3367 FDSAF RD HALIFAX CASN 8C5ASE SAAB -55.555555 66.6666666 000 -77.777777 88.8888888 000 0000000628ISOCKE RE CHARLOTTETOWN 449 UYRNT ECSARW RD CHARLOTTETOWN CAPE CSE8HR SAAB -99.999999 11.1111111 000 -22.222222 33.3333333 000
同样,我已经能够使用此代码将文件转换为csv,但格式不正确:
import csv
rf = open(r'C:\Users\...New Folder\practice.txt', 'r') #input file handle
wf = open(r'C:\Users\...New Folder\Book1.csv','w') #output file handle
writer = csv.writer(wf)
for row in rf.readlines():
writer.writerow(row.split())
rf.close() # close input file handle
wf.close() # close output file handle
答案 0 :(得分:2)
使用struct
撕开每个固定宽度的行,在适当的位置进行修剪。
答案 1 :(得分:2)
使用切片对象:
>>> widths = 1,2,3
>>> slices = []
>>> offset = 0
>>> for w in widths:
... slices.append(slice(offset, offset + w))
... offset += w
...
>>> slices
[slice(0, 1, None), slice(1, 3, None), slice(3, 6, None)]
>>> pieces = ["abcdef"[slice] for slice in slices]
>>> pieces
['a', 'bc', 'def']
>>>
答案 2 :(得分:1)
如果还有人在寻找解决方案,我在python中开发了一个小脚本。它很容易使用,只要你有python 3.5
https://github.com/just10minutes/FixedWidthToDelimited/blob/master/FixedWidthToDelimiter.py
"""
This script will convert Fixed width File into Delimiter File, tried on Python 3.5 only
Sample run: (Order of argument doesnt matter)
python ConvertFixedToDelimiter.py -i SrcFile.txt -o TrgFile.txt -c Config.txt -d "|"
Inputs are as follows
1. Input FIle - Mandatory(Argument -i) - File which has fixed Width data in it
2. Config File - Optional (Argument -c, if not provided will look for Config.txt file on same path, if not present script will not run)
Should have format as
FieldName,fieldLength
eg:
FirstName,10
SecondName,8
Address,30
etc:
3. Output File - Optional (Argument -o, if not provided will be used as InputFIleName plus Delimited.txt)
4. Delimiter - Optional (Argument -d, if not provided default value is "|" (pipe))
"""
from collections import OrderedDict
import argparse
from argparse import ArgumentParser
import os.path
import sys
def slices(s, args):
position = 0
for length in args:
length = int(length)
yield s[position:position + length]
position += length
def extant_file(x):
"""
'Type' for argparse - checks that file exists but does not open.
"""
if not os.path.exists(x):
# Argparse uses the ArgumentTypeError to give a rejection message like:
# error: argument input: x does not exist
raise argparse.ArgumentTypeError("{0} does not exist".format(x))
return x
parser = ArgumentParser(description="Please provide your Inputs as -i InputFile -o OutPutFile -c ConfigFile")
parser.add_argument("-i", dest="InputFile", required=True, help="Provide your Input file name here, if file is on different path than where this script resides then provide full path of the file", metavar="FILE", type=extant_file)
parser.add_argument("-o", dest="OutputFile", required=False, help="Provide your Output file name here, if file is on different path than where this script resides then provide full path of the file", metavar="FILE")
parser.add_argument("-c", dest="ConfigFile", required=False, help="Provide your Config file name here,File should have value as fieldName,fieldLength. if file is on different path than where this script resides then provide full path of the file", metavar="FILE",type=extant_file)
parser.add_argument("-d", dest="Delimiter", required=False, help="Provide the delimiter string you want",metavar="STRING", default="|")
args = parser.parse_args()
#Input file madatory
InputFile = args.InputFile
#Delimiter by default "|"
DELIMITER = args.Delimiter
#Output file checks
if args.OutputFile is None:
OutputFile = str(InputFile) + "Delimited.txt"
print ("Setting Ouput file as "+ OutputFile)
else:
OutputFile = args.OutputFile
#Config file check
if args.ConfigFile is None:
if not os.path.exists("Config.txt"):
print ("There is no Config File provided exiting the script")
sys.exit()
else:
ConfigFile = "Config.txt"
print ("Taking Config.txt file on this path as Default Config File")
else:
ConfigFile = args.ConfigFile
fieldNames = []
fieldLength = []
myvars = OrderedDict()
with open(ConfigFile) as myfile:
for line in myfile:
name, var = line.partition(",")[::2]
myvars[name.strip()] = int(var)
for key,value in myvars.items():
fieldNames.append(key)
fieldLength.append(value)
with open(OutputFile, 'w') as f1:
fieldNames = DELIMITER.join(map(str, fieldNames))
f1.write(fieldNames + "\n")
with open(InputFile, 'r') as f:
for line in f:
rec = (list(slices(line, fieldLength)))
myLine = DELIMITER.join(map(str, rec))
f1.write(myLine + "\n")
答案 3 :(得分:0)
就我个人而言,FixedWidth模块运行得很好。
请参见https://pypi.org/project/FixedWidth/
需要一些设置,但是,您可以将字段描述为字符串,数字(具有指定精度的功能),左对齐,右对齐,每个字段的填充字符等。
功能非常强大,尤其是如果您需要分析注释而不是一种文件类型:您只需提供预期的字段描述,它便可以完成其他所有工作。