Question

我目前正在开展一个小型计划。

程序的目的是从文件中获取输入，编辑文件以删除包含字母的任何单词＆＃34; l＆＃34;然后将其输出到输出文件中。

我目前的代码有效，但是，它不会删除包含字母＆＃34; l＆＃34;只是信件本身。

这是我的代码

def my_main(ifile_name, ofile_name):
    ifile_name = open(ifile_name, 'r')
    ofile_name = open(ofile_name, "w+")
    delete_list = ['l']
    for line in ifile_name:
        for word in delete_list:
            line = line.replace(word, "")
        ofile_name.write(line)
    ifile_name.close()
    ofile_name.close()

谢谢

更新

这是输入文件的样子：

The first line never changes. 
The second line was a bit much longer. 
The third line was short. 
The fourth line was nearly the longer line. 
The fifth was tiny. 
The sixth line is just one line more.
The seventh line was the last line of the original file.

当代码正确时，输出文件应如下所示

The first never changes. 
The second was a bit much. 
The third was short. 
The fourth was the. 
The fifth was tiny. 
The sixth is just one more.
The seventh was the of the.

Answer 1

没有看到你的文件是什么样的，很难说出究竟要使用什么，所以如果你能更新那个很棒的问题

但是目前你正在循环遍历每个字母而不是单词...使用split（）将单词拆分成一个列表并更改该列表然后将这些单词重新加入以获得一个字符串，而不包含包含您的字母的单词< / p>

words = ''
with open(ifile_name,"r") as file:
    for line in file:
        list_of_words = line.split(' ')
        for key, word in enumerate(list_of_words):
            if 'l' in word:
                list_of_words[key] = ''

        words += ' '.join(w for w in list_of_words if w != '')
        words += '\n'

with open(ofile_name, "w+") as file:
    file.write(words)

这件好事就是你没有任何白色空间问题。你会得到一个带有单个空格的常规字符串

编辑：正如评论中指出的那样，更好的方法（在整个文件的非内存中）是内联的

with open(ifile_name,"r") as in_file, open(ofile_name, "w+") as out_file:
    for line in file:
        list_of_words = line.split(' ')
        for key, word in enumerate(list_of_words):
            if 'l' in word:
                list_of_words[key] = ''

        out_file.write(' '.join(w for w in list_of_words if w != ''))

Answer 2

如果您只需要一个完整的新文件而不保留删除的单词的记录，那么这是一个非常简单的解决方案，不需要您将所有数据存储在内存中：

def remove_words(in_file, to_remove, out_file):
    with open(in_file) as f, open(out_file, "w") as f2:
        f2.writelines(" ".join([word for word in line.split()
                         if not to_remove.issubset(word)]) + "\n"
                             for line in f)


remove_words("test.txt", {"l"}, "removed.txt")

现在删除包含更新的行：

In [23]: cat test.txt
The first line never changes.
The second line was a bit much longer.
The third line was short.
The fourth line was nearly the longer line.
The fifth was tiny.
The sixth line is just one line more.
The seventh line was the last line of the original file.

In [24]: remove_words("test.txt",{"l"},"removed.txt")

In [25]: cat removed.txt
The first never changes.
The second was a bit much
The third was short.
The fourth was the
The fifth was tiny.
The sixth is just one more.
The seventh was the of the

Answer 3

一个想法可能是使用regular expression re.sub(r'\S*l\S*',r'',text)，完整的程序会显示：

import re

def my_main(ifile_name, ofile_name):
    with open (ifile_name,"r") as ifile_name :
        text=ifile_name.read()
    text2 = re.sub(r'\S*l\S*',r'',text)
    with open(ofile_name, "w+") as ofile_name :
        ofile_name.write(text2)

问题是只会删除单词本身，而不会删除其周围的空格。一个潜在的解决方案是捕获单词旁边（或之前）的空间：

re.sub(r'\S*l\S*\s*',r'',text)

程序如下：

import re

def my_main(ifile_name, ofile_name):
    with open (ifile_name,"r") as ifile_name :
        text=ifile_name.read()
    text2 = re.sub(r'\S*l\S*\s*',r'',text)
    with open(ofile_name, "w+") as ofile_name :
        ofile_name.write(text2)

这种方法的一个潜在缺点是文件需要适合（虚拟）内存：对于大型文件（1 GiB +），该过程可能会因为使用太多资源而减慢甚至被操作系统杀死。

Answer 4

好好想一想，你在循环着什么？

for line in ifile_name: #line == every line in the file
    for word in delete_list: #word is equal to every 'word' (although it is mroe a letter) in delete_list
        line = line.replace(word, "") #you are replacing word (which is 'l') with a space

你可能想要更像的东西：

for line in ifile_name:
        for word in line.split(): #iterate through words in your line, not delete_list
            if any(x in word for x in delete_list): #check if any of the letters in delete_list are in word
                line = line.replace(word,'') #replace the whole word with blanks

请注意，使用此代码，您将留下额外的空格：

this_line_is -> this__is
    ^    ^          ^^

因此您可以致电：line = line.replace(word+' ', '')，但这可能会导致'wordwithl.'等案件出现问题

Python删除包含＆＃34; l＆＃34;

4 个答案: