我有一种CSV文件,其中一条逻辑行的输入可以分成多条物理线。
数据示例:
":T1","A1","B1","C1"
":T2","A2","B2","C2",
"D2","E2"
":T3","A3","B3","C3",
"D3"
":T4","A4"
这是四条逻辑行,连续符号由行末尾的逗号分隔表示。
我尝试在python中使用csv模块:
import csv
with open('2.dat','r') as csvfile:
datreader = csv.reader(csvfile, delimiter=',' , quotechar='"')
for row in datreader:
print (', '.join(row))
print ("*******************************")
给出了:
:T1, A1, B1, C1
*******************************
:T2, A2, B2, C2,
*******************************
D2, E2
*******************************
:T3, A3, B3, C3,
*******************************
D3
*******************************
:T4, A4
*******************************
我想要的是什么:
:T1, A1, B1, C1
*******************************
:T2, A2, B2, C2, D2, E2
*******************************
:T3, A3, B3, C3, D3
*******************************
:T4, A4
*******************************
我不确定使用csv模块正确解析此数据的最佳方法。输入数据集可能是数百万行。
答案 0 :(得分:2)
一种方法是首先更正您的文件以匹配CSV标准,然后解析它。
根据您的试用数据:
data = """
":T1","A1","B1","C1"
":T2","A2","B2","C2",
"D2","E2"
":T3","A3","B3","C3",
"D3"
":T4","A4"
""".strip('\n')
简单的正则表达式可以合并分割线:
import re
parsed = re.sub(r',\n', ",", data)
print(parsed)
它返回:
":T4","A4"
":T1","A1","B1","C1"
":T2","A2","B2","C2","D2","E2"
":T3","A3","B3","C3","D3"
":T4","A4"
符合CSV标准,可以轻松解析。
答案 1 :(得分:1)
使用end
函数print
参数的另一个“游戏”:
import csv
with open('.2dat', 'r') as f:
reader = csv.reader(f)
for i,r in enumerate(reader):
if r[0].startswith(':T'):
if i > 0: print('\n','*'*30, sep='')
print(', '.join(r), end='')
else:
print(', '.join(r), end='')
输出:
:T1, A1, B1, C1
******************************
:T2, A2, B2, C2, D2, E2
******************************
:T3, A3, B3, C3, D3
******************************
:T4, A4
答案 2 :(得分:0)
我认为没有一个神奇的CSV解析器可以完全按照你的意愿行事。你自己必须做很少的工作。
创建一个新的空行列表。循环遍历datreader
中的行。如果一行以:
开头,请将其附加到新列表中。如果没有,请将其与新列表中的最后一行连接。
答案 3 :(得分:0)
这应该这样做:
:T1, A1, B1, C1
:T2, A2, B2, C2, D2, E2
:T3, A3, B3, C3, D3
:T4, A4
Hashtable data = new Hashtable();
Session.Add("ket", data);