我正在尝试解析一个xml文件,其中包含一些带有引号的文本。
以下是xml文件的一行作为样本。
<Video ratingKey="7459" key="/library/metadata/7459" studio="Paramount Pictures" type="movie" title=""Crocodile" Dundee" contentRating="PG-13" summary="When a New York reporter plucks crocodile hunter Dundee from the Australian Outback for a visit to the Big Apple, it's a clash of cultures and a recipe for good-natured comedy as naïve Dundee negotiates the concrete jungle. Dundee proves that his instincts are quite useful in the city and adeptly handles everything from wily muggers to high-society snoots without breaking a sweat." rating="6.3" year="1986" tagline="The Wizard of Auz hits The Big Apple!" thumb="/library/metadata/7459/thumb/1382989284" art="/library/metadata/7459/art/1382989284" duration="5352480" originallyAvailableAt="1986-04-24" addedAt="1382987525" updatedAt="1382989284">
当我使用这个简单的代码来读取xml文件时,我收到错误
import xml.etree.ElementTree as ET
tree = ET.parse('MovieList After HD Crash.txt')
root = tree.getroot()
print root.tag
print root.attrib
错误为xml.etree.ElementTree.ParseError: not well-formed (invalid token): line ..., column ...
是否有不同的方法用这些带有额外引号的行解析xml文件?
罗布。
答案 0 :(得分:0)
你需要使用“
来逃避双引号title=""Crocodile" Dundee"
将成为
title=""Crocodile" Dundee"