Question

我正在使用以下代码使用RegEx将输出打印到txt文件。但是我总是收到此错误消息：

 File "C:\lib\re.py", line 213, in findall
return _compile(pattern, flags).findall(string)

TypeError：期望的字符串或类似字节的对象

import glob
import os
import re


def extractor():
    os.chdir(r"F:\Test")
    for file in glob.iglob("*.html"):  # iterates over all files in the directory ending in .html
        with open(file, encoding="utf8") as f, open((file.rsplit(".", 1)[0]) + ".txt", "w") as out:
            contents = f.read()
            extract = re.compile(r'RegEx', re.I | re.S)
            if re.findall(extract, contents) is not None:
                for x in re.findall(extract, contents):
                    out.write(x)
            out.close()
extractor()

任何人都知道导致此错误的原因是什么？显然它与类型错误有关？

Answer 1

稍作调整：

import glob
import os
import re


def extractor():
    # you only need it once, dont' you?
    extract = re.compile(r'RegEx', re.I | re.S)
    os.chdir(r"F:\Test")
    for file in glob.iglob("*.html"):  # iterates over all files in the directory ending in .html
    with open(file, encoding="utf8") as f, open((file.rsplit(".", 1)[0]) + ".txt", "w") as out:
        contents = f.read()
        for match in extract.findall(contents):
            out.write(match)
        out.close()

extractor()

这使用extract作为对象，甚至不需要在循环中进行if not None检查。
如果它仍然不起作用，请详细说明你的实际正则表达式（它有几个小组等吗？）。

将RegEx写入txt文件

1 个答案: