从文件解析xml但不是字符串时出错?蟒蛇

时间:2018-03-31 09:24:31

标签: python xml parsing xmltodict

我正在尝试使用import UIKit struct Products: Decodable { let products: [product] } struct product: Decodable { let title: String let id: Int let price: String let sale_price: String? let featured_src: String? let short_description: String } class ViewController: UIViewController { override func viewDidLoad() { super.viewDidLoad() let jsonUrlString = "https://www.komeil24.com/wc-api/v3/products" guard let url = URL(string: jsonUrlString) else {return} URLSession.shared.dataTask(with: url) { (data, response, error) in guard let data = data else {return} do { let products = try JSONDecoder().decode(Products.self, from: data) print(products.products) } catch let jsonErr { print("Error" , jsonErr) } }.resume() } } 来解析大量的xml文件,以便我可以将它们转换为数据帧,但是,当我尝试解析实际的xml文件时,我得到错误:

xml2dict

这个错误对于所有xml文件都是完全相同的,包括“第1行,第5列”,它们的长度差异很大,但结构上都是相同的。

当我尝试在xthon中将xml文件的内容复制为字符串时,使用xml2dict进行解析非常有效。例如:

"ExpatError: not well-formed (invalid token): line 1, column 5"

在上面的示例中xmlstr ="""<?xml version="1.0" encoding="utf-8"?> <document id="DDI-DrugBank.d200"> <sentence id="DDI-DrugBank.d200.s0" text="Co-administration of probenecid with acyclovir has been shown to increase the mean half-life and the area under the concentration-time curve."> <entity id="DDI-DrugBank.d200.s0.e0" charOffset="21-30" type="drug" text="probenecid"/> <entity id="DDI-DrugBank.d200.s0.e1" charOffset="37-45" type="drug" text="acyclovir"/> <pair id="DDI-DrugBank.d200.s0.p0" e1="DDI-DrugBank.d200.s0.e0" e2="DDI-DrugBank.d200.s0.e1" ddi="true" type="mechanism"/> </sentence> <sentence id="DDI-DrugBank.d200.s1" text="Urinary excretion and renal clearance were correspondingly reduced."/> <sentence id="DDI-DrugBank.d200.s2" text="The clinical effects of this combination have not been studied."/> </document>""" import xmltodict as x2d nestdict1 = x2d.parse('Train/DrugBank/Aciclovir_ddi.xml') nestdict2 = x2d.parse(xmlstr) 引发了错误,而nestdict1仍然正常,尽管nestdict2是文件xmlstr的直接复制和粘贴

1 个答案:

答案 0 :(得分:1)

您需要传递文件对象,而不是文件名的字符串。

来自docs

In [4]:print(xmltodict.parse.__doc__)
Parse the given XML input and convert it into a dictionary.

    `xml_input` can either be a `string` or a file-like object.

因此,创建一个文件描述符,如:

fd = open("Train/DrugBank/Aciclovir_ddi.xml")

然后将其传递给解析方法:

x2d.parse(fd)