我有一个.txt文件,如下面报告的示例所示。我想将其转换为.csv表,但没有取得太大的成功。
Mack3 Line Item Journal Time 14:22:33 Date 03.10.2015
Panteni Ledger 1L TGEPIO00/CANTINAOAS Page 20.001
--------------------------------------------------------------------------------------------------------------------------------------------
| Pstng Date|Entry Date|DocumentNo|Itm|Doc..Date |BusA|PK|SG|Sl|Account |User Name |LCurr| Amount in LC|Tx|Assignment |S|
|------------------------------------------------------------------------------------------------------------------------------------------|
| 07.01.2014|07.02.2014|4919005298| 36|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 0,85 | |20140107 | |
| 07.01.2014|07.02.2014|4919065298| 29|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 2,53 | |20140107 | |
| 07.01.2014|07.02.2014|4919235298| 30|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 30,00 | |20140107 | |
| 07.01.2014|07.02.2014|4119005298| 32|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 1,00 | |20140107 | |
| 07.01.2014|07.02.2014|9019005298| 34|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 11,10 | |20140107 | |
|------------------------------------------------------------------------------------------------------------------------------------------|
有问题的文件是SAP报告的结构。练习python并查看其他文章,我发现了以下代码:
with open('file.txt', 'rb') as f_input:
for line in filter(lambda x: len(x) > 2 and x[0] == '|' and x[1].isalpha(), f_input):
header = [cols.strip() for cols in next(csv.reader(StringIO(line), delimiter='|', skipinitialspace=True))][1:-1]
break
with open('file.txt', 'rb') as f_input, open(str(ii + 1) + 'output.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(header)
for line in filter(lambda x: len(x) > 2 and x[0] == '|' and x[1] != '-' and not x[1].isalpha(), f_input):
csv_input = csv.reader(StringIO(line), delimiter='|', skipinitialspace=True)
csv_output.writerow(csv_input)
不幸的是,它不适用于我的情况。实际上,它创建了空的.csv文件,并且似乎无法正确读取csv_input。
任何可能的解决方案?
答案 0 :(得分:0)
一旦我们过滤掉几行,您的输入文件就可以被视为CSV,即那些不以管道符号'|'
开头,后跟空格' '
的行,这将使我们这个:
| Pstng Date|Entry Date|DocumentNo|Itm|Doc..Date |BusA|PK|SG|Sl|Account |User Name |LCurr| Amount in LC|Tx|Assignment |S| | 07.01.2014|07.02.2014|4919005298| 36|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 0,85 | |20140107 | | | 07.01.2014|07.02.2014|4919065298| 29|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 2,53 | |20140107 | | | 07.01.2014|07.02.2014|4919235298| 30|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 30,00 | |20140107 | | | 07.01.2014|07.02.2014|4119005298| 32|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 1,00 | |20140107 | | | 07.01.2014|07.02.2014|9019005298| 34|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 11,10 | |20140107 | |
您的输出主要为空,因为x[1].isalpha()
检查在此数据上永远不会为真。每行第1位的字符始终是空格,绝不能是字母。
不必多次打开输入文件,我们可以一次性读取,过滤和写入输出:
import csv
ii = 0
with open('file.txt', 'r', encoding='utf8', newline='') as f_input, \
open(str(ii + 1) + 'output.csv', 'w', encoding='utf8', newline='') as f_output:
input_lines = filter(lambda x: len(x) > 2 and x[0] == '|' and x[1] == ' ', f_input)
csv_input = csv.reader(input_lines, delimiter='|')
csv_output = csv.writer(f_output)
for row in csv_input:
csv_output.writerow(col.strip() for col in row[1:-1])
注意:
r
和w
模式,并显式声明文件编码。选择适合您文件的编码。csv
模块,请使用newline=''
打开文件(让csv
模块选择正确的行尾)with
在\
语句中包装多个文件。StringIO
完全不需要。skipinitialspace=True
,因为某些列的末尾也有空格。因此,在写行时,我在每个值上手动调用.strip()
。[1:-1]
是摆脱多余的空列所必需的(在输入的第一个|
之前和之后)输出如下
Pstng Date,Entry Date,DocumentNo,Itm,Doc..Date,BusA,PK,SG,Sl,Account,User Name,LCurr,Amount in LC,Tx,Assignment,S 07.01.2014,07.02.2014,4919005298,36,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"0,85",,20140107, 07.01.2014,07.02.2014,4919065298,29,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"2,53",,20140107, 07.01.2014,07.02.2014,4919235298,30,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"30,00",,20140107, 07.01.2014,07.02.2014,4119005298,32,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"1,00",,20140107, 07.01.2014,07.02.2014,9019005298,34,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"11,10",,20140107,