Question

我写了一个类来处理大文件，我想为类创建一个“写”方法，这样我就可以轻松地对文件中的数据进行更改，然后写出一个新文件。

我希望能做的是：

1。）读入原始文件

sources = Catalog(<filename>)

2.。）对文件中包含的数据进行更改

for source in sources:
    source['blah1'] = source['blah1'] + 4

3.）将更新后的值写入新文件

sources.catalog_write(<new_filename>)

为此，我写了一个相当简单的生成器，

class Catalog(object):
    def __init__(self, fname):
        self.data = open(fname, 'r')

        self.header = ['blah1', 'blah2', 'blah3']

    def next(self):
        line = self.data.readline()
        line = line.lstrip()
        if line == "":
            self.data.close()
            raise StopIteration()

        cols = line.split()
        if len(cols) != len(self.header):
            print "Input catalog is not valid."
            raise StopIteration()

        for element, col in zip(self.header, cols):
            self.__dict__.update({element:float(col)})

        return self.__dict__.copy()

    def __iter__(self):
        return self

这是我尝试写入方法：

def catalog_write(self, outname):
    with open(outname, "w") as out:
        out.write("    ".join(self.header) + "\n")
        for source in self:
            out.write("    ".join(map(str, source)) + "\n")

但是当我尝试调用该类方法时，我收到以下错误，

 File "/Catalogs.py", line 53, in catalog_write
    for source in self:
  File "/Catalogs.py", line 27, in next
    line = self.data.readline()
ValueError: I/O operation on closed file

我意识到这是因为生成器通常是一次性交易，但我知道有一些解决方法（比如question和post，但我不确定最好的方法是什么要做到这一点。这些文件非常大，我希望他们的读入和使用尽可能高效（时间和内存方面）。有没有pythonic方法来做到这一点？

Answer 1

做出的假设：

输入文件：[infile]

1.2 3.4 5.6
4.5 6.7 8.9

用法：

>>> a = Catalog('infile')
>>> a.catalog_write('outfile')

现在输出文件：[outfile]

blah1 blah2 blah3
1.2 3.4 5.6
4.5 6.7 8.9

再次将其写入另一个文件：[outfile2]

>>> a.catalog_write('outfile2')

现在输出文件：[outfile2]

blah1 blah2 blah3
1.2 3.4 5.6
4.5 6.7 8.9

因此，根据您发布的内容，您需要重新打开data [假设它是文件名为self.fname的文件对象]。

修改您的__init__以将fname保存为属性

最初创建一个数据对象[我没有在__init__阶段打开它，这样你就可以在需要时打开和关闭next()方法中的所有内容]我刚刚创建了数据作为对象这样它就可以像文件对象一样拥有closed属性，这样您就可以检查self.data.closed是否为True并从next()方法中重新打开，并从中读取同样的。

def __init__(self, fname):
    self.fname = fname
    self.data = object()
    self.data = lambda: None
    self.data.closed = True
    self.header = ['blah1', 'blah2', 'blah3']

现在，下一个方法修改如下：

def next(self):
    if self.data.closed:
        self.data = open(self.fname, "r")
    line = self.data.readline()
    line = line.lstrip()
    if line == "":
        if not self.data.closed:
            self.data.close()
        raise StopIteration()

    cols = line.split()
    if len(cols) != len(self.header):
        print "Input catalog is not valid."
        if not self.data.closed:
            self.data.close()
        raise StopIteration()

    for element, col in zip(self.header, cols):
        self.__dict__.update({element:float(col)})

    return self.__dict__.copy()

您的catalog_write方法应如下所示：

请注意，对数据的任何修改都必须在for循环中完成，如图所示。

def catalog_write(self, outname):
    with open(outname, "w") as out:
        out.write("    ".join(self.header) + "\n")
        for source in self:
            source['blah1'] = 444 # Data modified.
            out.write("    ".join(map(str, [source[self.header[i]] for i in range(len(self.header)) ])) + "\n")

我假设您希望将标题的更新值写为outname文件中的列。

＆＃34;写＆＃34; python中的生成器方法

1 个答案: