我有一个像这样的xml文件:
'''some non ascii character'''
<b:FatturaElettronica xmlns:b="#">
<FatturaElettronicaHeader>
<DatiTrasmissione>
<IdTrasmittente>
<IdPaese>IT</IdPaese>
我需要删除所有内容,直到
<FatturaElettronicaHeader>
现在的代码是:
import xml.etree.ElementTree as ET
import xml.etree.ElementTree as ETree
from lxml import etree
parser = etree.XMLParser(encoding='utf-8', recover=True, remove_comments=True, resolve_entities=False)
tree = ETree.parse('test.xml', parser)
root = tree.getroot()
print etree.tostring(root)
给我:
Traceback (most recent call last):
File "xml2.py", line 14, in <module>
print etree.tostring(root)
File "src/lxml/etree.pyx", line 3350, in lxml.etree.tostring
TypeError: Type 'NoneType' cannot be serialized.
淘汰xml文件的第一部分。
TY
答案 0 :(得分:0)
您可以使用 find()函数搜索第一个括号。
if ( simulator.availability !== '(available)' && simulator.isAvailable !== true ) { continue; }
但是您的xml文件也必须正确:
import xml.etree.ElementTree as ET
with open ('...XMLFILE.xml', 'r') as file:
filestring = file.read()
XML_start = filestring.find('<')
print(XML_start) #gives 31
tree = ET.fromstring(filestring[XML_start:])
for i in tree.iter():
print(i.tag) #gives {#}FatturaElettronica, FatturaElettronicaHeader, ...