Question

我对Python 2.x中的unicode非常困惑。

我正在使用BeautifulSoup来抓取一个网页，我正在尝试将我找到的东西插入到一个字典中，其名称为密钥，url为值。

我正在使用BeautifulSoup的find函数来获取我需要的信息。我的代码开头如下：

name = i.find('a').string
url = i.find('a').get('href')

这是有效的，除了从find返回的thign是一个Object，而不是一个字符串。

以下是令我困惑的事情

如果我在将其分配给变量之前尝试将其转换为str类型，则有时会抛出UnicodeEncodeError。

'ascii' codec can't encode character u'\xa0' in position 5: ordinal not in range(128)

我谷歌周围发现我应该编码为ascii

我尝试添加：

print str(i.find('a').string).encode('ascii', 'ignore')

没有运气，仍然给出了一个Unicode错误。

从那时起，我尝试使用repr。

print repr(i.find('a').string)

这很有效......差不多了！

我在这里遇到了一个新问题。

一旦完成所有内容，并且构建了字典，我就无法访问任何内容！它一直给我一个KeyError。

我可以遍历dict：

for i in sorted(data.iterkeys()):
    print i


>>> u'Key1'
>>> u'Key2'
>>> u'Key3'
>>> u'Key4'

但如果我尝试访问这样的dict项目：

print data['key1']

OR

print data[u'key1']

OR

test = unicode('key1')
print data[test]

他们都返回KeyErrors，这对我来说是100％的混淆。我认为它与它们是Unicode对象有关。

我已经尝试过我能想到的一切，但我无法弄清楚发生了什么。

哦！更奇怪的是，这段代码：

name = repr(i.find('a').string)
print type(name)

返回

>>> type(str)

但如果我只打印那件事

print name

它将其显示为unicode字符串

>>>> u'string name'

Answer 1

.string值确实不是字符串。您需要将其强制转换为unicode()：

name = unicode(i.find('a').string)

这是一个名为NavigableString的unicode- 之类的对象。如果您真的需要而不是str，您可以从那里对其进行编码：

name = unicode(i.find('a').string).encode('utf8')

或类似的。要在dict中使用，我会使用unicode()个对象，而不是编码。

要了解unicode()和str()之间的区别以及要使用的编码，建议您阅读Python Unicode HOWTO。