Question

xml标题：

<?xml version="1.0" encoding="UTF-8"?><points>

xml数据片段：

<point>
<id>1781</id><lon>43.245766666667</lon><lat>56.636883333333</lat>
<type>vert</type><last_update>2016-11-18 22:55:11</last_update>
<active>1</active><verified>1</verified><international>0</international><name>Vеrshilovo</name><name_ru>Вершилово</name_ru><city/><belongs>АОН</belongs><inde

代码：

tree = ET.parse(XMLFIL)
root = tree.getroot()
allpoints=root.findall('point')
for point in allpoints:
 id=point.find('id').text
 name=point.find('name').text.encode('utf8')
 print name

这将奖励我“AttributeError：'NoneType'对象没有属性'encode'”如果我省略'编码'我得到臭名昭着的''ascii'编解码器无法编码字符u'\ u0435'in位置1：序数不在范围内（128）'

注意错误是'Vershilovo'的'e'：它看起来不错，但xml数据的hexdump给出了

00000000  56 e5 72 73 68 69 6c 6f  76 6f 0a                 |V.rshilovo.|

我发现了几个相关的问题，但没有一个给我带来解决方案。根本原因可能是我的xml数据编码不正确，但我无法控制它。我可以忍受不得不将非法值重置为某些默认值，如“???”或者这样。

Answer 1

看起来某些项目没有text属性。如果text为None，您可以使用try-except块或使用默认值，例如：

name = (point.find('name').text or '').encode('utf8')

另一个例子，使用if语句：

name = point.find('name').text
if name: 
    name = name.encode('utf8')

xml解析和unicode（再次）

1 个答案: