Question

我试图将页面源保存到文件中，因此每次我想测试某些内容时，我都不必经常重新运行我的代码。

我有：

html_source = driver.page_source
soup = BeautifulSoup(html_source, 'lxml') # added `lxml` only b/c I got a warning saying I should
soup = soup.prettify()
with open('pagesource.html', 'wb') as f_out:
    f_out.write(soup)

我得到的错误是：

UnicodeEncodeError: 'ascii' codec can't encode character u'\xab' in position 223871: ordinal not in range(128)

我也试过了f_out.write(str(soup))，但是没有用。

如何将内容写入文件？

Answer 1

BeautifulSoup用于解析Html而不是抓取它。如果您可以导入urllib，请尝试urlretrieve：

import urllib
urllib.urlretrieve("http://www.example.com/test.html", "test.txt")

Answer 2

这对我有用：

import urllib2

html = urllib2.urlopen('http://www.example.com').read()

现在html包含该网址的源代码。

  with open('web.html', 'w') as f:
      f.write(html)

您现在应该可以使用浏览器打开它。

Python + Beautiful Soup：将html源写入文件

2 个答案: