python pyparsing非结构化文本文件

时间:2019-12-09 12:13:53

标签: python pyparsing

下面有一个文本文件,我想将其转换为csv文件。

+---------------------+--------------+---------------+
| column_date         | column_id    | column_desc   |
+---------------------+--------------+---------------+
| 2001-01-01 00:00:00 | 12345        | abc bar       |
| 2001-01-01 00:00:00 | 4567         | defg          |
+---------------------+--------------+---------------+

我正在寻找的预期输出是:

column_date,column_id,column_desc
2001-01-01 00:00:00,12345,abc bar
2001-01-01 00:00:00,4567,defg

有没有通过pyparsing做到这一点的例子? 谢谢。

1 个答案:

答案 0 :(得分:0)

可能的解决方案

import re

with open("file.csv", "r+") as myFile:
    content = myFile.read()
    regex = r'^\|\s+(.+)\s+\|\s+(\w+)\s+\|\s+(.+)\s+\|$'
    print(content)
    match = re.findall(regex, content, re.MULTILINE)
    [print(line[0]+","+line[1]+","+line[2]) for line in match]

输出

|---------------------+-----------+-------------|
| column_date         | column_id | column_desc |
|---------------------+-----------+-------------|
| 2001-01-01 00:00:00 |     12345 | abc bar     |
| 2001-01-01 00:00:00 |      4567 | defg        |
|---------------------+-----------+-------------|

column_date        ,column_id,column_desc
2001-01-01 00:00:00,12345,abc bar    
2001-01-01 00:00:00,4567,defg    

您可能要在打印之前删除不需要的空格