我正在尝试打开一个文本文件,删除其后面有某个单词的某些单词,然后将新内容写入新文件。使用以下代码,new_content
包含我需要的内容,并创建一个新文件,但它是空的。我无法弄清楚为什么。我尝试过不同的缩进并传入一种编码类型,没有运气。非常感谢任何帮助。
import glob
import os
import nltk, re, pprint
from nltk import word_tokenize, sent_tokenize
import pandas
import string
import collections
path = "/pathtofiles"
for file in glob.glob(os.path.join(path, '*.txt')):
if file.endswith(".txt"):
f = open(file, 'r')
flines = f.readlines()
for line in flines:
content = line.split()
for word in content:
if word.endswith(']'):
content.remove(word)
new_content = ' '.join(content)
f2 = open((file.rsplit( ".", 1 )[ 0 ] ) + "_preprocessed.txt", "w")
f2.write(new_content)
f.close
答案 0 :(得分:1)
这应该适用于@firefly。如果你有问题,很乐意回答问题。
import glob
import os
path = "/pathtofiles"
for file in glob.glob(os.path.join(path, '*.txt')):
if file.endswith(".txt"):
with open(file, 'r') as f:
flines = f.readlines()
new_content = []
for line in flines:
content = line.split()
new_content_line = []
for word in content:
if not word.endswith(']'):
new_content_line.append(word)
new_content.append(' '.join(new_content_line))
f2 = open((file.rsplit( ".", 1 )[ 0 ] ) + "_preprocessed.txt", "w")
f2.write('\n'.join(new_content))
f.close
f2.close