Question

我正在尝试找到一种简单的方法来链接类似文件的对象。我有一个CSV文件，它在磁盘上分为多个段。我希望能够将它们传递给csv.DictReader，而不必先进行串联的连接。

类似的东西：

files = map(io.open, filenames)
for row in csv.DictReader(io.chain(files)):
    print(row[column_name])

但是我找不到io.chain之类的东西。如果我自己解析，则可以执行以下操作：

from itertools import chain

def lines(fp):
    for line in fp.readlines():
        yield line

a = open('segment-1.dat')
b = open('segment-2.dat')
for line in chain(lines(a), lines(b)):
    row = line.strip().split(',')

但是DictReader需要一些可以调用read()的东西，因此该方法不起作用。我可以遍历文件，从先前的读者那里复制fieldnames属性，但是我希望有一些东西可以让我将所有处理都放在一个循环体内。

Answer 1

迭代可能有帮助

from io import BytesIO


a = BytesIO(b"1st file 1st line \n1st file 2nd line")
b = BytesIO(b"2nd file 1st line \n2nd file 2nd line")

class Reader: 
    def __init__(self, *files): 
        self.files = files 
        self.current_idx = 0

    def __iter__(self): 
        return self 

    def __next__(self): 
        f = self.files[self.current_idx]
        for line in f: 
            return line 
        else: 
            if self.current_idx < len(self.files) - 1:
                self.current_idx += 1
                return next (self)
        raise StopIteration("feed me more files") 

r = Reader(a, b)

for l in r:
    print(l)

结果：

b'1st file 1st line \n'
b'1st file 2nd line'
b'2nd file 1st line \n'
b'2nd file 2nd line'

修改：

：D然后是标准的图书馆礼物。

https://docs.python.org/3.7/library/fileinput.html


with fileinput.input(files=('spam.txt', 'eggs.txt')) as f:
    for line in f:
        process(line)

Answer 2

您可以创建一个作为迭代器的类，该类在每次调用其__next__()方法（引用docs）时返回一个字符串。

import csv


class ChainedCSVfiles:
    def __init__(self, filenames):
        self.filenames = filenames

    def __iter__(self):
        return next(self)

    def __next__(self):
        for filename in self.filenames:
            with open(filename, 'r', newline='') as csvfile:
                for line in csvfile:
                    yield line


filenames = 'segment-1.dat', 'segment-2.dat'
reader = csv.DictReader(ChainedCSVfiles(filenames),
                        fieldnames=('field1', 'field2', 'field3'))
for row in reader:
    print(row)

如何在Python中链接文件对象？

2 个答案: