Question

我在编写从多个html文件（非英文文本）到.txt outfile的RegEx结果时遇到问题。它在屏幕上的新行上将它们打印为多个字符串，但是当我尝试将其写入outfile时，它只会写一个随机字符串。我的代码看起来像这样：你能帮忙我怎么把所有大约100个文件中的所有字符串都写到outfile中？

from bs4 import BeautifulSoup
import sys
import string
import re
import os

text = glob.glob('C:/Users/dell/Desktop/python-for-text-analysis-master/Notebooks/MEK/*')   
for filename in text:
    with open(filename, encoding='ISO-8859-1', errors="ignore") as f:
        mytext = f.read()

soup = BeautifulSoup(mytext, "lxml")
extracted_text = soup.getText()

pattern = r"\ba\b\s\bleg[\w]+bb\b\s\b[\w]+\b"
result = (", ".join(re.findall(pattern, mytext)))

file = "C:/Users/dell/Desktop/python-for-text-analysis-master/Data/Charlie/charlie_neww.txt"
for row in result:
    with open (file, "w", encoding="iso-8859-1", errors="ignore") as outfile:
        print(result, end='\n', file=outfile)

Answer 1

with open (file, "w", ...

“w”模式截断文件（即每次打开文件时，文件都被清除）。考虑“追加”的模式“a”。

将多个html文件的RegEx结果写入.txt outfile

1 个答案: