Question

我正在尝试替换具有多个行的文件中的某些单词。以下是我写的代码。请注意我还在学习python。

ParsedUnFormattedFile = io.open("test.txt", "r", encoding="utf-8", closefd=True).read()

remArticles = {' a ':'', ' the ':'', ' and ':'', ' an ':''}

for line in ParsedUnFormattedFile:
    for i in remArticles.keys():
           words = line.split()
           ParsedReplacementFile = ParsedUnFormattedFile.replace(words,remArticles[i])

    FormattedFileForIndexing =  io.open("FormattedFileForIndexing.txt", "w", encoding="utf-8", closefd=True)
    FormattedFileForIndexing.write(ParsedReplacementFile)

如果我通过直接读取一行替换它，它只替换所有单词中的一个单词。它通常是我系统中的“那个”。

所以我想分开并寻找永远的单词然后替换它。但是我得到以下错误：

line 14, in <module>
    ParsedReplacementFile = ParsedUnFormattedFile.replace(words,remArticles[i])
TypeError: coercing to Unicode: need string or buffer, list found

我怎样才能纠正这个问题？

由于

Answer 1

当您致电split()时，您会返回一个列表。

'a b c asd sas'.split()
['a', 'b', 'c', 'asd', 'sas']

相反，在拆分之前替换，或者将列表连接回字符串然后替换。要将列表连接到字符串：

words = ''.join(words)

EG：

''.join(['a','b','c'])
>>> 'abc'

Answer 2

有很多问题。

ParsedUnFormattedFile是字符串，而不是文件，因为您调用了.read()。这意味着您的for line in ParsedUnFormattedFile循环不会遍历文件中的行，而是遍历各个字符。
每次for i in remArticles.keys():循环运行时，都会为ParsedReplacementFile分配一个新值。它只会保留最后一个。
您在FormattedFileForIndexing.txt循环的每次迭代中都覆盖了文件for line in ParsedUnFormattedFile:。

最好从头开始重做所有内容。

remArticles = {' a ':'', ' the ':'', ' and ':'', ' an ':''}

with io.open("test.txt", "r", encoding="utf-8") as ParsedUnFormattedFile:
    with io.open("FormattedFileForIndexing.txt", "w", encoding="utf-8") as FormattedFileForIndexing:
        for line in ParsedUnFormattedFile:
            for i in remArticles:
                line= line.replace(i, remArticles[i])
            FormattedFileForIndexing.write(line)

替换python中文件的一行中的单词

2 个答案: