除了使用csv
模块以流方式读取python3中的csv文件之外,还有其他选择吗?目前,我的数据如下所示:
"field1"::"field2"::"field3"\x02\n
"1"::"hi\n"::"3"\x02\n
"8"::"ok"::"3"\x02\n
分隔符是两个字符::
(csv
模块仅接受一个字符分隔符),行分隔符还包含两个字符\x02\n
。是否有任何可以在流式传输模式下支持python的csvreader支持该功能?
以下是我要执行的操作的示例:
>>> import csv
>>> s = ''''"field1"::"field2"::"field3"\x02\n\n"1"::"hi\n"::"3"\x02\n\n"8"::"ok"::"3"\x02\n'''
>>> csvreader=csv.reader(s, delimiter='::', lineterminator='\x02\n')
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: "delimiter" must be a 1-character string
仅读取此csv加载大熊猫似乎就算100倍,所以我想看看还有什么其他选择。
答案 0 :(得分:1)
您已经发现CSV库不适用于该数据格式。您虽然可以预先准备数据。例如,以下方法应该有效:
from io import StringIO
import csv
s = '''"field1"::"field2"::"field3"\x02\n\n"1"::"hi\n"::"3"\x02\n\n"8"::"ok"::"3"\x02\n'''
def csv_reader_alt(source):
return csv.reader((line.replace('\x02', '').replace('::', ':') for line in source), delimiter=':')
for row in csv_reader_alt(StringIO(s)):
if row:
print(row)
为您提供以下输出:
['field1', 'field2', 'field3']
['1', 'hi\n', '3']
['8', 'ok', '3']
答案 1 :(得分:1)
@MartinEvans在回答中显示了一种很好的方法。
以下是使用自定义定界符(使用自定义生成器实现)通过适当的文件处理从文件(而不是从内存中的字符串)读取的代码:
def get_line(file, delimiter='\n', bufsize=4096):
# https://stackoverflow.com/a/19600562/9225671
buf = ''
while True:
chunk = file.read(bufsize)
if len(chunk) == 0:
# end of file has been reached; serve the remaining data and exit
yield buf
return
buf += chunk
line_list = buf.split(delimiter)
# don't serve the last part yet, first we need to read more chunks from the file
buf = line_list.pop(-1)
for line in line_list:
yield line
if __name__ == '__main__':
with open('my_file.csv') as f:
for line in get_line(f, delimiter='\x02\n'):
if len(line) > 0:
parts = line.split('::')
print(parts)
print([
e.strip('"')
for e in parts])
对您有用吗?