Question

在解析简单的XML文本（以utf-8编码）时，xml.etree.ElementTree.fromstring抛出UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 1022-1023: invalid continuation byte

这是我的代码：

import xml.etree.ElementTree as ET    

posts_file = open(posts_path, "r", encoding="utf-8")
count = 0

line = posts_file.read()
root = ET.fromstring(line)

这是xml文件：

<row Id="376095" 
PostTypeId="2" ParentId="376081" 
CreationDate="2008-12-17T21:28:45.560" 
Score="103" 
Body="&lt;pre&gt;&lt;code&gt;$('#mytable tr').each(function() {&#xA;    var customerId = $(this).find(&quot;td:first&quot;).html();    &#xA;});&#xA;&lt;/code&gt;&lt;/pre&gt;&#xA;&#xA;&lt;p&gt;What you are doing is iterating through all the trs in the table, finding the first td in the current tr in the loop, and extracting its inner html.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;To select a particular cell, you can reference them with an index:&lt;/p&gt;&#xA;&#xA;&lt;pre&gt;&lt;code&gt;$('#mytable tr').each(function() {&#xA;    var customerId = $(this).find(&quot;td&quot;).eq(2).html();    &#xA;});&#xA;&lt;/code&gt;&lt;/pre&gt;&#xA;&#xA;&lt;p&gt;In the above code, I will be retrieving the value of the &lt;strong&gt;third row&lt;/strong&gt; (the index is zero-based, so the first cell index would be 0)&lt;/p&gt;&#xA;&#xA;&lt;hr&gt;&#xA;&#xA;&lt;p&gt;Here's how you can do it without jQuery:&lt;/p&gt;&#xA;&#xA;&lt;pre&gt;&lt;code&gt;var table = document.getElementById('mytable'), &#xA;    rows = table.getElementsByTagName('tr'),&#xA;    i, j, cells, customerId;&#xA;&#xA;for (i = 0, j = rows.length; i &amp;lt; j; ++i) {&#xA;    cells = rows[i].getElementsByTagName('td');&#xA;    if (!cells.length) {&#xA;        continue;&#xA;    }&#xA;    customerId = cells[0].innerHTML;&#xA;}&#xA;&lt;/code&gt;&lt;/pre&gt;&#xA;&#xA;&lt;p&gt;&lt;/p&gt;&#xA;" 
OwnerUserId="44084" 
OwnerDisplayName="Dreas" 
LastEditorUserId="880797" 
LastEditorDisplayName="Dreas" 
LastEditDate="2011-11-04T16:25:28.717" 
LastActivityDate="2011-11-04T16:25:28.717" 
CommentCount="6" />

我正在使用Python 3.6.2

xml.etree.ElementTree在解析时抛出UnicodeDecodeError

0 个答案: