改变'变成正常人物

时间:2012-09-28 19:27:59

标签: python html character-encoding

我在显示内容时遇到问题, 我的计划:

#! /usr/bin/python

import urllib
import re

url = "http://yahoo.com"
pattern = '''<span class="medium item-label".*?>(.*)</span>'''

website = urllib.urlopen(url)
pageContent = website.read()
result = re.findall(pattern, pageContent)

for record in result:
    print record

输出:

Masked teen killed by dad
First look in &#39;Hotel of Doom&#39;
Ex-NFL QB&#39;s sad condition
Reporter ignores warning
Romney&#39;s low bar for debates

所以问题是我应该在代码中包含什么才能将&#39变换为字符

3 个答案:

答案 0 :(得分:11)

在Python2中:

In [16]: text = 'Ex-NFL QB&#39;s sad condition'

In [17]: import HTMLParser

In [18]: parser = HTMLParser.HTMLParser()

In [19]: parser.unescape(text)
Out[19]: u"Ex-NFL QB's sad condition"

在Python3中:

import html.parser as htmlparser
parser = htmlparser.HTMLParser()
parser.unescape(text)

答案 1 :(得分:0)

在Javascript中:

    text = text.replace(/&#39;/g,"'");

答案 2 :(得分:0)

对于python 3:

AWS_WEBPROXY_HOST=... 
AWS_WEBPROXY_PORT=...