Question

我正在尝试解析一个xml文件，其中包含一些带有引号的文本。

以下是xml文件的一行作为样本。

<Video ratingKey="7459" key="/library/metadata/7459" studio="Paramount Pictures" type="movie" title=""Crocodile" Dundee" contentRating="PG-13" summary="When a New York reporter plucks crocodile hunter Dundee from the Australian Outback for a visit to the Big Apple, it's a clash of cultures and a recipe for good-natured comedy as naïve Dundee negotiates the concrete jungle. Dundee proves that his instincts are quite useful in the city and adeptly handles everything from wily muggers to high-society snoots without breaking a sweat." rating="6.3" year="1986" tagline="The Wizard of Auz hits The Big Apple!" thumb="/library/metadata/7459/thumb/1382989284" art="/library/metadata/7459/art/1382989284" duration="5352480" originallyAvailableAt="1986-04-24" addedAt="1382987525" updatedAt="1382989284">

当我使用这个简单的代码来读取xml文件时，我收到错误

import xml.etree.ElementTree as ET  
tree = ET.parse('MovieList After HD Crash.txt')  
root = tree.getroot()  
print root.tag  
print root.attrib

错误为xml.etree.ElementTree.ParseError: not well-formed (invalid token): line ..., column ...

是否有不同的方法用这些带有额外引号的行解析xml文件？

罗布。

Answer 1

你需要使用“

来逃避双引号

title=""Crocodile" Dundee"

将成为

title="&quot;Crocodile&quot; Dundee"

使用python双引号解析文件的xml

1 个答案: