我在显示内容时遇到问题, 我的计划:
#! /usr/bin/python
import urllib
import re
url = "http://yahoo.com"
pattern = '''<span class="medium item-label".*?>(.*)</span>'''
website = urllib.urlopen(url)
pageContent = website.read()
result = re.findall(pattern, pageContent)
for record in result:
print record
输出:
Masked teen killed by dad
First look in 'Hotel of Doom'
Ex-NFL QB's sad condition
Reporter ignores warning
Romney's low bar for debates
所以问题是我应该在代码中包含什么才能将&#39变换为字符
答案 0 :(得分:11)
在Python2中:
In [16]: text = 'Ex-NFL QB's sad condition'
In [17]: import HTMLParser
In [18]: parser = HTMLParser.HTMLParser()
In [19]: parser.unescape(text)
Out[19]: u"Ex-NFL QB's sad condition"
在Python3中:
import html.parser as htmlparser
parser = htmlparser.HTMLParser()
parser.unescape(text)
答案 1 :(得分:0)
在Javascript中:
text = text.replace(/'/g,"'");
答案 2 :(得分:0)
对于python 3:
AWS_WEBPROXY_HOST=...
AWS_WEBPROXY_PORT=...