Question

我正在解决一个问题并陷入困境

我有一组（可能很大的）文本文件，我需要对它应用一系列过滤器和转换，然后将其导出到其他地方。

所以我大概有

def apply_filter_transformer(basepath = None, newpath = None, fts= None):
    #because all the raw studies in basepath should not be modified, so I first cp all to newpath
    for i in listdir(basepath):
        file(path.join(newpath, i), "wb").writelines(file(path.join(basepath, i)).readlines())
    for i in listdir(newpath):
        fileobj = open(path.join(newpath, i), "r+")
        for fcn in fts:
            fileobj = fcn(fileobj)
        if fileobj is not None:
            fileobj.writelines(fileobj.readlines())
        try:
            fileobj.close()
        except:
            print i, "at", fcn
            pass
def main():
    apply_filter_transformer(path.join(pardir, pardir, "studies"),
                         path.abspath(path.join(pardir, pardir, "filtered_studies")),
                         [
                        #transformer_addMemo,
                          filter_executable,
                          transformer_identity,
                          filter_identity,
                          ])

和apply_filter_transformer中的fts是一个函数列表，它接受一个python文件对象并返回一个python文件对象。我遇到的问题是，当我想将字符串插入到文本对象中时，我会收到无法解释的错误并且整个上午都被卡住了。

def transformer_addMemo(fileobj):
    STYLUSMEMO =r"""hellow world"""
    study = fileobj.read()
    location = re.search(r"</BasicOptions>", study)
    print fileobj.name
    print fileobj.mode
    fileobj.seek(0)
    fileobj.write(study[:location.end()] + STYLUSMEMO + study[location.end():])
    return fileobj

这给了我

Traceback (most recent call last):
 File "E:\mypy\reg_test\src\preprocessor\preprocessor.py", line 292, in <module>
  main()
 File "E:\mypy\reg_test\src\preprocessor\preprocessor.py", line 288, in main
 filter_identity,
 File "E:\mypy\reg_test\src\preprocessor\preprocessor.py", line 276, in     apply_filter_transformer
   fileobj.writelines(fileobj.readlines())
   IOError: [Errno 0] Error

如果有人能给我更多关于错误的信息，我将非常感激。

Answer 1

有一个方便的python模块可用于修改或读取一组文件：fileinput

我不确定导致此错误的原因。但是你正在将整个文件读入内存，这在你的情况下是一个坏主意，因为文件可能很大。使用fileinput可以轻松替换文件。例如：

import fileinput
import sys

for line in fileinput.input(list_of_files, inplace=True):
    sys.stdout.write(line)
    if keyword in line:
         sys.stdout.write(my_text)

Answer 2

实际上不可能从您发布的代码中分辨出导致错误的原因。问题可能出在您为转换函数采用的协议中。

我会稍微简化一下代码：

fileobj = file.open(path, mode)
fileobj = fcn(fileobj)
fileobj.writelines(fileobj.readlines())

我有什么保证fcn返回一个以原始文件模式打开的文件？它返回一个完全打开的文件？它返回文件？好吧，我没有。

您似乎没有任何理由在您的流程中使用文件对象。由于您正在将整个文件读入内存，为什么不让变换函数获取并返回字符串？所以你的代码看起来像这样：

with open(filename, "r") as f:
    s = f.read()
for transform_function in transforms:
    s = transform_function(s)
with open(filename, "w") as f:
    f.write(s)

除此之外，这完全将程序的文件I / O部分与数据转换部分分离，因此一个中的问题不会影响另一个。

在给定文件对象的文件中间插入字符串

2 个答案: