url = "http://weibo.cn/*******"
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36','Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'}
lxml = requests.get(url, headers = headers).content
selector = etree.HTML(lxml)
i = selector.xpath('//span[@class ="ctt"]')
for each in i:
a = each.xpath('string(.)').encode('utf-8','ignore')
print(a)
x = urllib.parse.unquote(str(a)).encode('utf-8')
b = x.encode('latin-1').encode('gbk').decode('utf-8')
print (b)
以上是我的代码。
输出如下:xb2\xbd\xa6\xa7
等
这些看起来像字节
有人可以告诉我如何将它们转换为UTF-8或字符串?
我是一只新鸟,对Python感兴趣,学习如何编写spider.py
谢谢!