我正在使用以下代码使用RegEx将输出打印到txt文件。但是我总是收到此错误消息:
File "C:\lib\re.py", line 213, in findall
return _compile(pattern, flags).findall(string)
TypeError:期望的字符串或类似字节的对象
import glob
import os
import re
def extractor():
os.chdir(r"F:\Test")
for file in glob.iglob("*.html"): # iterates over all files in the directory ending in .html
with open(file, encoding="utf8") as f, open((file.rsplit(".", 1)[0]) + ".txt", "w") as out:
contents = f.read()
extract = re.compile(r'RegEx', re.I | re.S)
if re.findall(extract, contents) is not None:
for x in re.findall(extract, contents):
out.write(x)
out.close()
extractor()
任何人都知道导致此错误的原因是什么?显然它与类型错误有关?
答案 0 :(得分:0)
稍作调整:
import glob
import os
import re
def extractor():
# you only need it once, dont' you?
extract = re.compile(r'RegEx', re.I | re.S)
os.chdir(r"F:\Test")
for file in glob.iglob("*.html"): # iterates over all files in the directory ending in .html
with open(file, encoding="utf8") as f, open((file.rsplit(".", 1)[0]) + ".txt", "w") as out:
contents = f.read()
for match in extract.findall(contents):
out.write(match)
out.close()
extractor()
这使用extract
作为对象,甚至不需要在循环中进行if not None
检查。
如果它仍然不起作用,请详细说明你的实际正则表达式(它有几个小组等吗?)。