在以UTF-8编码的文本文件中用字典替换单词

时间:2018-08-04 22:19:44

标签: python dictionary replace file-io runtime-error

我正在尝试打开一个文本文件,然后通读它,将某些字符串替换为存储在词典中的字符串。基于对Replacing words in text file using a dictionaryHow to search and replace text in a file using Python?

的回答

就像:

# edit print line to print (line) 
import fileinput

text = "sample file.txt"
fields = {"pattern 1": "replacement text 1", "pattern 2": "replacement text 2"}

for line in fileinput.input(text, inplace=True):
    line = line.rstrip()
    for field in fields:
        if field in line:
            line = line.replace(field, fields[field])

    print (line)

我的文件使用utf-8编码。

运行此命令时,控制台显示此错误:

UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to <undefined>

encoding = "utf8"添加到fileinput.FileInput()时显示错误:

TypeError: __init__() got an unexpected keyword argument 'encoding'

openhook=fileinput.hook_encoded("utf8")添加到fileinput.FileInput()时显示错误:

ValueError: FileInput cannot use an opening hook in inplace mode

我不想忽略错误而插入子代码'ignore'

我有文件,字典,并希望将字典中的值替换为stdout之类的文件。

utf-8中的源文件:

Plain text on the line in the file.
This is a greeting to the world.
Hello world!
Here's another plain text.
And here too!

我想将world替换为earth

在字典中:{"world": "earth"}

utf-8中的修改文件:

Plain text on the line in the file.
This is a greeting to the earth.
Hello earth!
Here's another plain text.
And here too!

2 个答案:

答案 0 :(得分:0)

fileinput库存在一些我addressed in the past in a blog post的问题;其中之一是您无法设置编码,并且无法使用就地文件重写

以下代码可以可以做到这一点,但是您必须将print()调用替换为对传出文件对象的写操作:

from contextlib import contextmanager
import io
import os


@contextmanager
def inplace(filename, mode='r', buffering=-1, encoding=None, errors=None,
            newline=None, backup_extension=None):
    """Allow for a file to be replaced with new content.

    yields a tuple of (readable, writable) file objects, where writable
    replaces readable.

    If an exception occurs, the old file is restored, removing the
    written data.

    mode should *not* use 'w', 'a' or '+'; only read-only-modes are supported.

    """

    # move existing file to backup, create new file with same permissions
    # borrowed extensively from the fileinput module
    if set(mode).intersection('wa+'):
        raise ValueError('Only read-only file modes can be used')

    backupfilename = filename + (backup_extension or os.extsep + 'bak')
    try:
        os.unlink(backupfilename)
    except os.error:
        pass
    os.rename(filename, backupfilename)
    readable = io.open(backupfilename, mode, buffering=buffering,
                       encoding=encoding, errors=errors, newline=newline)
    try:
        perm = os.fstat(readable.fileno()).st_mode
    except OSError:
        writable = open(filename, 'w' + mode.replace('r', ''),
                        buffering=buffering, encoding=encoding, errors=errors,
                        newline=newline)
    else:
        os_mode = os.O_CREAT | os.O_WRONLY | os.O_TRUNC
        if hasattr(os, 'O_BINARY'):
            os_mode |= os.O_BINARY
        fd = os.open(filename, os_mode, perm)
        writable = io.open(fd, "w" + mode.replace('r', ''), buffering=buffering,
                           encoding=encoding, errors=errors, newline=newline)
        try:
            if hasattr(os, 'chmod'):
                os.chmod(filename, perm)
        except OSError:
            pass
    try:
        yield readable, writable
    except Exception:
        # move backup back
        try:
            os.unlink(filename)
        except os.error:
            pass
        os.rename(backupfilename, filename)
        raise
    finally:
        readable.close()
        writable.close()
        try:
            os.unlink(backupfilename)
        except os.error:
            pass

所以您的代码如下:

导入文件输入

text = "sample file.txt"
fields = {"pattern 1": "replacement text 1", "pattern 2": "replacement text 2"}

with inplace(text, encoding='utf8') as (infh, outfh):
    for line in infh:
        for field in fields:
            if field in line:
                line = line.replace(field, fields[field])

        outfh.write(line)

请注意,您现在不必删除换行符。

答案 1 :(得分:0)

我试图用这个:

with open(fileName1, "r+", encoding = "utf8", newline='') as fileIn, open(fileName1, "r+", encoding = "utf8", newline='') as fileOut:
    for line in fileIn:             
        for field in fields:
            if field in line:
                line = line.replace(field, fields[field])
        fileOut.write(line)

注意:使用一个文件时,废料将被推到文件末尾。 到目前为止,我还没有弄清楚为什么。它不反映替换的数量。 (替换的数量大于废物的数量。)

伪数学: oriA

我准备好解决它。

编辑:当我使用两个文件时,一切正常。将第二个fileName1中的open()更改为fileName2。并将mod参数更改为"w+"