Question

我有一个.txt文件，如下面报告的示例所示。我想将其转换为.csv表，但没有取得太大的成功。

Mack3                                            Line Item Journal                                        Time 14:22:33     Date  03.10.2015
Panteni    Ledger 1L                                                                                    TGEPIO00/CANTINAOAS Page      20.001
--------------------------------------------------------------------------------------------------------------------------------------------
|    Pstng Date|Entry Date|DocumentNo|Itm|Doc..Date |BusA|PK|SG|Sl|Account   |User Name   |LCurr|      Amount in LC|Tx|Assignment        |S|
|------------------------------------------------------------------------------------------------------------------------------------------|
|    07.01.2014|07.02.2014|4919005298| 36|07.01.2019|    |81|  |  |60532640  |tARFooWMOND |EUR  |             0,85 |  |20140107          | |
|    07.01.2014|07.02.2014|4919065298| 29|07.01.2019|    |81|  |  |60532640  |tARFooWMOND |EUR  |             2,53 |  |20140107          | |
|    07.01.2014|07.02.2014|4919235298| 30|07.01.2019|    |81|  |  |60532640  |tARFooWMOND |EUR  |            30,00 |  |20140107          | |
|    07.01.2014|07.02.2014|4119005298| 32|07.01.2019|    |81|  |  |60532640  |tARFooWMOND |EUR  |             1,00 |  |20140107          | |
|    07.01.2014|07.02.2014|9019005298| 34|07.01.2019|    |81|  |  |60532640  |tARFooWMOND |EUR  |            11,10 |  |20140107          | |
|------------------------------------------------------------------------------------------------------------------------------------------|

有问题的文件是SAP报告的结构。练习python并查看其他文章，我发现了以下代码：

    with open('file.txt', 'rb') as f_input:
        for line in filter(lambda x: len(x) > 2 and x[0] == '|' and x[1].isalpha(), f_input):
            header = [cols.strip() for cols in next(csv.reader(StringIO(line), delimiter='|', skipinitialspace=True))][1:-1]
            break
    with open('file.txt', 'rb') as f_input, open(str(ii + 1) + 'output.csv', 'wb') as f_output:
        csv_output = csv.writer(f_output)
        csv_output.writerow(header)
        for line in filter(lambda x: len(x) > 2 and x[0] == '|' and x[1] != '-' and not x[1].isalpha(), f_input):
            csv_input = csv.reader(StringIO(line), delimiter='|', skipinitialspace=True)
            csv_output.writerow(csv_input)

不幸的是，它不适用于我的情况。实际上，它创建了空的.csv文件，并且似乎无法正确读取csv_input。

任何可能的解决方案？

Answer 1

一旦我们过滤掉几行，您的输入文件就可以被视为CSV，即那些不以管道符号'|'开头，后跟空格' '的行，这将使我们这个：

|    Pstng Date|Entry Date|DocumentNo|Itm|Doc..Date |BusA|PK|SG|Sl|Account   |User Name   |LCurr|      Amount in LC|Tx|Assignment        |S|
|    07.01.2014|07.02.2014|4919005298| 36|07.01.2019|    |81|  |  |60532640  |tARFooWMOND |EUR  |             0,85 |  |20140107          | |
|    07.01.2014|07.02.2014|4919065298| 29|07.01.2019|    |81|  |  |60532640  |tARFooWMOND |EUR  |             2,53 |  |20140107          | |
|    07.01.2014|07.02.2014|4919235298| 30|07.01.2019|    |81|  |  |60532640  |tARFooWMOND |EUR  |            30,00 |  |20140107          | |
|    07.01.2014|07.02.2014|4119005298| 32|07.01.2019|    |81|  |  |60532640  |tARFooWMOND |EUR  |             1,00 |  |20140107          | |
|    07.01.2014|07.02.2014|9019005298| 34|07.01.2019|    |81|  |  |60532640  |tARFooWMOND |EUR  |            11,10 |  |20140107          | |

您的输出主要为空，因为x[1].isalpha()检查在此数据上永远不会为真。每行第1位的字符始终是空格，绝不能是字母。

不必多次打开输入文件，我们可以一次性读取，过滤和写入输出：

import csv

ii = 0

with open('file.txt', 'r', encoding='utf8', newline='') as f_input, \
     open(str(ii + 1) + 'output.csv', 'w', encoding='utf8', newline='') as f_output:

    input_lines = filter(lambda x: len(x) > 2 and x[0] == '|' and x[1] == ' ', f_input)

    csv_input = csv.reader(input_lines, delimiter='|')
    csv_output = csv.writer(f_output)

    for row in csv_input:
        csv_output.writerow(col.strip() for col in row[1:-1])

注意：

读取文本文件时，您应该不使用二进制模式。分别使用r和w模式，并显式声明文件编码。选择适合您文件的编码。
要使用csv模块，请使用newline=''打开文件（让csv模块选择正确的行尾）
您可以在行末使用with在\语句中包装多个文件。
StringIO完全不需要。
我不使用skipinitialspace=True，因为某些列的末尾也有空格。因此，在写行时，我在每个值上手动调用.strip()。
[1:-1]是摆脱多余的空列所必需的（在输入的第一个|之前和之后）

输出如下

Pstng Date,Entry Date,DocumentNo,Itm,Doc..Date,BusA,PK,SG,Sl,Account,User Name,LCurr,Amount in LC,Tx,Assignment,S
07.01.2014,07.02.2014,4919005298,36,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"0,85",,20140107,
07.01.2014,07.02.2014,4919065298,29,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"2,53",,20140107,
07.01.2014,07.02.2014,4919235298,30,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"30,00",,20140107,
07.01.2014,07.02.2014,4119005298,32,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"1,00",,20140107,
07.01.2014,07.02.2014,9019005298,34,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"11,10",,20140107,

如何将SAP .txt提取文件转换为.csv文件

1 个答案: