我正在尝试找到一种简单的方法来链接类似文件的对象。我有一个CSV文件,它在磁盘上分为多个段。我希望能够将它们传递给csv.DictReader
,而不必先进行串联的连接。
类似的东西:
files = map(io.open, filenames)
for row in csv.DictReader(io.chain(files)):
print(row[column_name])
但是我找不到io.chain
之类的东西。如果我自己解析,则可以执行以下操作:
from itertools import chain
def lines(fp):
for line in fp.readlines():
yield line
a = open('segment-1.dat')
b = open('segment-2.dat')
for line in chain(lines(a), lines(b)):
row = line.strip().split(',')
但是DictReader
需要一些可以调用read()
的东西,因此该方法不起作用。我可以遍历文件,从先前的读者那里复制fieldnames
属性,但是我希望有一些东西可以让我将所有处理都放在一个循环体内。
答案 0 :(得分:1)
迭代可能有帮助
from io import BytesIO
a = BytesIO(b"1st file 1st line \n1st file 2nd line")
b = BytesIO(b"2nd file 1st line \n2nd file 2nd line")
class Reader:
def __init__(self, *files):
self.files = files
self.current_idx = 0
def __iter__(self):
return self
def __next__(self):
f = self.files[self.current_idx]
for line in f:
return line
else:
if self.current_idx < len(self.files) - 1:
self.current_idx += 1
return next (self)
raise StopIteration("feed me more files")
r = Reader(a, b)
for l in r:
print(l)
结果:
b'1st file 1st line \n'
b'1st file 2nd line'
b'2nd file 1st line \n'
b'2nd file 2nd line'
修改:
:D然后是标准的图书馆礼物。
https://docs.python.org/3.7/library/fileinput.html
with fileinput.input(files=('spam.txt', 'eggs.txt')) as f:
for line in f:
process(line)
答案 1 :(得分:1)
您可以创建一个作为迭代器的类,该类在每次调用其__next__()
方法(引用docs)时返回一个字符串。
import csv
class ChainedCSVfiles:
def __init__(self, filenames):
self.filenames = filenames
def __iter__(self):
return next(self)
def __next__(self):
for filename in self.filenames:
with open(filename, 'r', newline='') as csvfile:
for line in csvfile:
yield line
filenames = 'segment-1.dat', 'segment-2.dat'
reader = csv.DictReader(ChainedCSVfiles(filenames),
fieldnames=('field1', 'field2', 'field3'))
for row in reader:
print(row)