美丽的汤,美化html到txt,得到编码错误

时间:2014-07-10 11:09:19

标签: python-2.7 encoding utf-8 beautifulsoup

我试图将html文件的美化打印保存到txt文件,但收到此错误消息:

Traceback (most recent call last):
  File "prettyhtmlfiles.py", line 16, in <module>
    file.write(soup.prettify())
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbb' in position 8532: ordinal not in range(128)

如何解决这个问题?

我的代码:

import urllib2
import os
from bs4 import BeautifulSoup
import csv

url = "/home/sveisa/S141test/ayuki.html"
with open(url, 'r') as f:
    data = f.read()
    soup = BeautifulSoup(open('/home/sveisa/S141test/ayuki.html').read())

print(soup.prettify())


file = open("newfile.txt", "w")

file.write(soup.prettify())

1 个答案:

答案 0 :(得分:2)

试试这个。它应该工作。

print >> file, (soup.prettify().encode('utf-8'))