在Python中读取CSV,其中输入行被拆分

时间:2017-10-17 07:54:31

标签: python csv

我有一种CSV文件,其中一条逻辑行的输入可以分成多条物理线。

数据示例:

":T1","A1","B1","C1"
":T2","A2","B2","C2",
"D2","E2"
":T3","A3","B3","C3",
"D3"
":T4","A4"

这是四条逻辑行,连续符号由行末尾的逗号分隔表示。

我尝试在python中使用csv模块:

import csv
with open('2.dat','r') as csvfile:
        datreader = csv.reader(csvfile, delimiter=',' , quotechar='"')
        for row in datreader:
                print (', '.join(row))
                print ("*******************************")

给出了:

:T1, A1, B1, C1
*******************************
:T2, A2, B2, C2,
*******************************
D2, E2
*******************************
:T3, A3, B3, C3,
*******************************
D3
*******************************
:T4, A4
*******************************

我想要的是什么:

:T1, A1, B1, C1
*******************************
:T2, A2, B2, C2, D2, E2
*******************************
:T3, A3, B3, C3, D3
*******************************
:T4, A4
*******************************

我不确定使用csv模块正确解析此数据的最佳方法。输入数据集可能是数百万行。

4 个答案:

答案 0 :(得分:2)

一种方法是首先更正您的文件以匹配CSV标准,然后解析它。

根据您的试用数据:

data = """
":T1","A1","B1","C1"
":T2","A2","B2","C2",
"D2","E2"
":T3","A3","B3","C3",
"D3"
":T4","A4"
""".strip('\n')

简单的正则表达式可以合并分割线:

import re
parsed = re.sub(r',\n', ",", data)
print(parsed)

它返回:

":T4","A4"
":T1","A1","B1","C1"
":T2","A2","B2","C2","D2","E2"
":T3","A3","B3","C3","D3"
":T4","A4"

符合CSV标准,可以轻松解析。

答案 1 :(得分:1)

使用end函数print参数的另一个“游戏”:

import csv

with open('.2dat', 'r') as f:
    reader = csv.reader(f)
    for i,r in enumerate(reader):
        if r[0].startswith(':T'):
            if i > 0: print('\n','*'*30, sep='')
            print(', '.join(r), end='')
        else:
            print(', '.join(r), end='')

输出:

:T1, A1, B1, C1
******************************
:T2, A2, B2, C2, D2, E2
******************************
:T3, A3, B3, C3, D3
******************************
:T4, A4

答案 2 :(得分:0)

我认为没有一个神奇的CSV解析器可以完全按照你的意愿行事。你自己必须做很少的工作。

创建一个新的空行列表。循环遍历datreader中的行。如果一行以:开头,请将其附加到新列表中。如果没有,请将其与新列表中的最后一行连接。

答案 3 :(得分:0)

这应该这样做:

:T1, A1, B1, C1
:T2, A2, B2, C2, D2, E2
:T3, A3, B3, C3, D3
:T4, A4
Hashtable data = new Hashtable();
Session.Add("ket", data);