我在编写从多个html文件(非英文文本)到.txt outfile的RegEx结果时遇到问题。它在屏幕上的新行上将它们打印为多个字符串,但是当我尝试将其写入outfile时,它只会写一个随机字符串。我的代码看起来像这样: 你能帮忙我怎么把所有大约100个文件中的所有字符串都写到outfile中?
from bs4 import BeautifulSoup
import sys
import string
import re
import os
text = glob.glob('C:/Users/dell/Desktop/python-for-text-analysis-master/Notebooks/MEK/*')
for filename in text:
with open(filename, encoding='ISO-8859-1', errors="ignore") as f:
mytext = f.read()
soup = BeautifulSoup(mytext, "lxml")
extracted_text = soup.getText()
pattern = r"\ba\b\s\bleg[\w]+bb\b\s\b[\w]+\b"
result = (", ".join(re.findall(pattern, mytext)))
file = "C:/Users/dell/Desktop/python-for-text-analysis-master/Data/Charlie/charlie_neww.txt"
for row in result:
with open (file, "w", encoding="iso-8859-1", errors="ignore") as outfile:
print(result, end='\n', file=outfile)
答案 0 :(得分:0)
with open (file, "w", ...
“w”模式截断文件(即每次打开文件时,文件都被清除)。考虑“追加”的模式“a”。