如何在python中循环更新文本文件

时间:2016-08-11 02:04:27

标签: python python-2.7 nlp text-mining

我编写的函数只能更新一次文本文件,但我需要反复执行。为了避免经常将临时文件复制到目标文件,我想只更新一次循环中的所有单词。我怎么能这样做? 这是我的python代码(但只更新一次):

import io
from tempfile import mkstemp
from shutil import move
from os import remove, close

def replaceWords(source_file_path, old_word, cluster_labels):

    new_word_list = [old_word + "_" + str(label) for label in cluster_labels]
    fh, target_file_path = mkstemp()

    with io.open(target_file_path, mode='w', encoding='utf8') as target_file:
        with io.open(source_file_path, mode='r', encoding='utf8') as source_file:
            index = 0
            for line in source_file:
                words =[]
                for word in line.split():
                    if word == old_word:
                        words.append(word.replace(old_word, new_word_list[index]))
                        index += 1
                    else:
                        words.append(word)
                target_file.write(" ".join(words))

    close(fh)
    remove(source_file_path)
    move(target_file_path, source_file_path)

例如:

第一次更新:

源文件上下文:of anarchism have often been divided into the categories of social and individualist anarchism or similar dual classifications

old_word:'of'

cluster_labels:'[1,2]'

更新后

: 目标文件上下文:of_1 anarchism have often been divided into the categories of_2 social and individualist anarchism or similar dual classifications

第二次更新:

old_word:'无政府主义'

cluster_labels:'[1,2]'

更新后

目标文件上下文:of_1 anarchism_1 have often been divided into the categories of_2 social and individualist anarchism_2 or similar dual classifications

在我的代码中,我必须调用该函数两次并复制文件两次,但是当需要更新的单词太多时,这种方法绝对是耗时且频繁的读/写/复制,这是io不友好的。

那么,是否有任何方法可以在不经常阅读/写入/复制的情况下优雅地处理此问题?

1 个答案:

答案 0 :(得分:0)

可以有很多方法可以做到这一点。对你所做的内联方法的一种方法可以是使用* argv来获取要替换的单词列表,并替换当前行中的单词。我在这里添加了一些伪代码,它没有针对错误进行测试。 请注意2项变更 1.在函数的输入参数中。 2.添加for循环以迭代输入参数。

#! /usr/bin/python

import io
from tempfile import mkstemp
from shutil import move
from os import remove, close
import logging
logging.basicConfig(level=logging.DEBUG, format=' %(asctime)s -%(levelname)s - %(message)s')

def replaceWords(**source_file_path, cluster_labels ,*argv**):
    old_word = 'of'
    new_word_list = [old_word + "_" + str(label) for label in cluster_labels]
    fh, target_file_path = mkstemp()
    logging.debug(new_word_list)
    logging.debug(old_word)

    with io.open(target_file_path, mode='w', encoding='utf8') as target_file:
        with io.open(source_file_path, mode='r', encoding='utf8') as source_file:
            index = 0
            for line in source_file:
                words =[]
                for word in line.split():
                        **for wordtochange in argv:**
                                if word == old_word:
                                        words.append(word.replace(old_word, new_word_list[index]))
                                        index += 1
                                else:
                                        words.append(word)
                target_file.write(" ".join(words))

    close(fh)
    remove(source_file_path)
    move(target_file_path, source_file_path)

replaceWords('file.txt',[1,2],('of','anarchism'))