这是代码段:
for i in obj:
url = "someurl" + i
oars = requests.get(url, timeout=1)
soup = BeautifulSoup(oars.content)
fout = open(i + ".html", "wt")
print((type(soup.prettify)))
fout.write(oars.text)
oars.close
#fout.write(soup.get_text())
# Still not working, using zsh for now
if call("html2text " + i + ".html" + ">" + i + ".txt", shell=True) == 0:
print("yay")
#call("rm -f " + i + ".html", shell=True)
else:
print(i)
但是html2text只是创建空的txt文件而不是正确地输出输出。
我甚至尝试用html2text
替换elinks -dump
,但无济于事。
答案 0 :(得分:0)
不确定,但这可能就是你所追求的
import subprocess
import sys
outfile = i + ".txt"
cmd = sys.path[0] + "/htmltotext " + i + ".html"
with open(outfile, "w") as output_f:
p = subprocess.Popen(cmd, stdout=output_f, shell=True)
答案 1 :(得分:0)
为什么不将html2text
用作Python库?
h = html2text.HTML2Text()
txt = h.handle(open(infile).read())