Question

我正在尝试使用fileinput模块inplace filtering feature来重写输入文件。

需要将编码（包括读取和写入）设置为latin-1并尝试将openhook=fileinput.hook_encoded('latin-1')传递给fileinput.input，但却被错误阻止了

ValueError: FileInput cannot use an opening hook in inplace mode

仔细观察后，我发现fileinput文档清楚地说明了这一点：你不能一起使用invert和openhook

我怎样才能解决这个问题？

Answer 1

据我所知，fileinput模块无法解决这个问题。您可以结合codecs模块，os.rename()和os.remove()来完成相同的任务：

import os
import codecs

input_name = 'some_file.txt'
tmp_name = 'tmp.txt'

with codecs.open(input_name, 'r', encoding='latin-1') as fi, \
     codecs.open(tmp_name, 'w', encoding='latin-1') as fo:

    for line in fi:
        new_line = do_processing(line) # do your line processing here
        fo.write(new_line)

os.remove(input_name) # remove original
os.rename(tmp_name, input_name) # rename temp to original name

如果您想要更改输出文件，还可以选择为输出文件指定新编码，或者如果您不希望更改输出文件，则可以在打开输出文件时将其保留为latin-1。< / p>

我知道这不是您正在寻找的就地修改，但它将完成您尝试执行的任务并且非常灵活。

Answer 2

这与其他答案非常相似，只是以函数形式完成，因此可以轻松地多次调用：

def inplace(orig_path, encoding='latin-1'):
    """Modify a file in-place, with a consistent encoding."""
    new_path = orig_path + '.modified'
    with codecs.open(orig_path, encoding=encoding) as orig:
        with codecs.open(new_path, 'w', encoding=encoding) as new:
            for line in orig:
                yield line, new
    os.rename(new_path, orig_path)

这就是它的实际效果：

for line, new in inplace(path):
    line = do_processing(line)  # Use your imagination here.
    new.write(line)

只要你指定正确的编码（在我的情况下我实际上需要utf-8到处都有，但你的需求明显不同），这对你的数据和python2都是有效的。

Answer 3

对于使用rename / remove的现有解决方案，我并不感到疯狂，因为它们过分简化了inplace标志所做的某些文件处理-例如处理文件模式，处理chmod属性等。

在我的情况下，因为我控制代码将在其中运行的环境，所以我决定唯一合理的解决方案是将我的语言环境设置为使用UTF8的语言环境：

export LC_ALL=en_US.UTF-8

效果是：

sh-4.2> python3.6 -c "import fileinput;
for line in fileinput.FileInput('DESCRIPTION', inplace=True): print(line.rstrip() + 'hi')
print('done')"
Traceback (most recent call last):
  File "<string>", line 2, in <module>
  File "/usr/lib64/python3.6/fileinput.py", line 250, in __next__
    line = self._readline()
  File "/usr/lib64/python3.6/fileinput.py", line 364, in _readline
    return self._readline()
  File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 227: ordinal not in range(128)'

sh-4.2> export LC_ALL=en_US.UTF-8
sh-4.2> python3.6 -c "import fileinput;
for line in fileinput.FileInput('DESCRIPTION', inplace=True): print(line.rstrip() + 'hi')
print('done')"
done

sh-4.2#

潜在的副作用是对其他文件输入和输出的更改，但我对此并不担心。

Answer 4

如果您不介意使用pip库，则in_place库支持编码。

import in_place

with in_place.InPlace(filename, encoding="utf-8") as fp:
  for line in fp:
    fp.write(line)

结合inplace过滤和fileinput模块中的编码设置

4 个答案: