在一组不工作的htmls上迭代地调用html2text

时间:2013-10-31 04:59:28

标签: python subprocess piping

这是代码段:

for i in obj:
    url = "someurl" + i
    oars = requests.get(url, timeout=1)
    soup = BeautifulSoup(oars.content)
    fout = open(i + ".html", "wt")
    print((type(soup.prettify)))
    fout.write(oars.text)
    oars.close
    #fout.write(soup.get_text())
    # Still not working, using zsh for now
    if call("html2text " + i + ".html" + ">" + i + ".txt", shell=True) == 0:
        print("yay")
        #call("rm -f " + i + ".html", shell=True)
    else:
        print(i)

但是html2text只是创建空的txt文件而不是正确地输出输出。 我甚至尝试用html2text替换elinks -dump,但无济于事。

2 个答案:

答案 0 :(得分:0)

不确定,但这可能就是你所追求的

import subprocess
import sys

outfile = i + ".txt"


cmd = sys.path[0] + "/htmltotext " + i + ".html"

with open(outfile, "w") as output_f:
    p = subprocess.Popen(cmd, stdout=output_f, shell=True)

答案 1 :(得分:0)

为什么不将html2text用作Python库?

h = html2text.HTML2Text()
txt = h.handle(open(infile).read())