Question

我正在使用BeautifulSoup来解析一些XML文件。此文件中的一个字段经常使用Unicode字符。我尝试使用encode。将unicode写入文件失败。

到目前为止，这个过程基本上是：

获取名称

gamename = items.find（'name'）。string.strip（）
然后将名称合并到一个列表中，该列表稍后会转换为字符串：

stringtoprint = userid，gamename.encode（'utf-8'）＃

newstring =“INSERT INTO collections VALUES”+ str（stringtoprint）+“;” + “\ n” 个

然后将该字符串写入文件。

listofgamesowned.write（newstring.encode（ “UTF-8”））

似乎我不必经常。编码。我在解析出名称后直接尝试编码，例如gamename = items.find('name').string.strip().encode('utf-8') - 然而，这似乎不起作用。

目前 - 'Uudet L \ xc3 \ xb6yt \ xc3 \ xb6retket'

正在打印并保存，而不是UudetLöytöretket。

如果这是我生成的字符串，那么我会使用something.write(u'Uudet L\xc3\xb6yt\xc3\xb6retket');但是，它是嵌入字符串中的一个元素。

Answer 1

Unicode是字符串的内存表示形式。当你写出或读入时，你需要编码和解码。

Uudet L\xc3\xb6yt\xc3\xb6retket是utf-8的{{1}}编码版本，因此您想要写出来。如果要从文件中读取字符串，则需要对其进行解码。

Uudet Löytöretket

请记住在您阅读之后立即进行编码并立即解码。