我对lxml有一个非常奇怪的问题,我尝试使用iterparse解析我的xml文件,如下所示:
for event, elem in etree.iterparse(input_file, events=('start', 'end')):
if elem.tag == 'tuv' and event == 'start':
if elem.get('{http://www.w3.org/XML/1998/namespace}lang') == 'en':
if elem.find('seg') is not None:
write_in_some_file
elif elem.get('{http://www.w3.org/XML/1998/namespace}lang') == 'de':
if elem.find('seg') is not None:
write_in_some_file
这非常简单,并且几乎可以完美运行,很快就会遍历我的xml文件,如果是elem,则检查language属性是'en'还是'de',然后检查是否有孩子,如果是它将其值写入文件
文件中似乎不存在一个<seg>! keine Spalten und Ventile</seg>
。
我不明白为什么这个看起来很不错的标签会造成问题(因为我无法使用其.text),请注意,其他所有标签都可以找到
<tu tuid="235084307" datatype="Text">
<prop type="score">1.67647</prop>
<prop type="score-zipporah">0.6683</prop>
<prop type="score-bicleaner">0.7813</prop>
<prop type="lengthRatio">0.740740740741</prop>
<tuv xml:lang="en">
<prop type="source-document">http://www.beviclean.de/en/shop/product-details/artikel/bevi-accessoires/34/7969ccc9b6/bevi-clean-ball.html</prop>
<prop type="source-document">http://www.beviclean.de/en/shop/product-details/artikel/bevi-accessoires/34//bevi-clean-ball.html</prop>
<seg>! no gaps and valves</seg>
</tuv>
<tuv xml:lang="de">
<prop type="source-document">http://www.beviclean.de/en/shop/product-details/artikel/bevi-accessoires/34/7969ccc9b6/bevi-clean-ball.html</prop>
<prop type="source-document">http://www.beviclean.de/en/shop/product-details/artikel/bevi-accessoires/34//bevi-clean-ball.html</prop>
<seg>! keine Spalten und Ventile</seg>
</tuv>
</tu>
答案 0 :(得分:1)
我不确定这是否是您要的内容(我本人还很陌生),但是
for event, elem in etree.iterparse('xml_try.txt', events=('start', 'end')):
if elem.tag == 'tuv' and event == 'start':
if elem.get('{http://www.w3.org/XML/1998/namespace}lang') == 'en':
if elem.find('seg') is not None:
print(elem[2].text)
elif elem.get('{http://www.w3.org/XML/1998/namespace}lang') == 'de':
if elem.find('seg') is not None:
print(elem[2].text)
生成此输出:
! no gaps and valves
! keine Spalten und Ventile
再次,如果这不是您的要求,我们深表歉意。
答案 1 :(得分:1)
在lxml docs中有此警告:
警告:在“开始”事件中,元素的任何内容(例如 兄弟姐妹或文本之后的子孙尚不可用,并且 不应访问。只能保证设置属性。
也许不是使用find()
中的tu
来获取seg
元素,而是更改“ if”语句以匹配seg
和“ end”事件。
您可以使用getparent()
从父xml:lang
获取tu
属性值。
示例(“ test.xml”和一个用于测试的附加“ tu”元素)
<tus>
<tu tuid="235084307" datatype="Text">
<prop type="score">1.67647</prop>
<prop type="score-zipporah">0.6683</prop>
<prop type="score-bicleaner">0.7813</prop>
<prop type="lengthRatio">0.740740740741</prop>
<tuv xml:lang="en">
<prop type="source-document">http://www.beviclean.de/en/shop/product-details/artikel/bevi-accessoires/34/7969ccc9b6/bevi-clean-ball.html</prop>
<prop type="source-document">http://www.beviclean.de/en/shop/product-details/artikel/bevi-accessoires/34//bevi-clean-ball.html</prop>
<seg>! no gaps and valves</seg>
</tuv>
<tuv xml:lang="de">
<prop type="source-document">http://www.beviclean.de/en/shop/product-details/artikel/bevi-accessoires/34/7969ccc9b6/bevi-clean-ball.html</prop>
<prop type="source-document">http://www.beviclean.de/en/shop/product-details/artikel/bevi-accessoires/34//bevi-clean-ball.html</prop>
<seg>! keine Spalten und Ventile</seg>
</tuv>
</tu>
<tu tuid="235084307A" datatype="Text">
<prop type="score">1.67647</prop>
<prop type="score-zipporah">0.6683</prop>
<prop type="score-bicleaner">0.7813</prop>
<prop type="lengthRatio">0.740740740741</prop>
<tuv xml:lang="en">
<prop type="source-document">http://www.beviclean.de/en/shop/product-details/artikel/bevi-accessoires/34/7969ccc9b6/bevi-clean-ball.html</prop>
<prop type="source-document">http://www.beviclean.de/en/shop/product-details/artikel/bevi-accessoires/34//bevi-clean-ball.html</prop>
<seg>! no gaps and valves #2</seg>
</tuv>
<tuv xml:lang="de">
<prop type="source-document">http://www.beviclean.de/en/shop/product-details/artikel/bevi-accessoires/34/7969ccc9b6/bevi-clean-ball.html</prop>
<prop type="source-document">http://www.beviclean.de/en/shop/product-details/artikel/bevi-accessoires/34//bevi-clean-ball.html</prop>
<seg>! keine Spalten und Ventile #2</seg>
</tuv>
</tu>
</tus>
Python 3.x
from lxml import etree
for event, elem in etree.iterparse("test.xml", events=("start", "end")):
if elem.tag == "seg" and event == "end":
current_lang = elem.getparent().get("{http://www.w3.org/XML/1998/namespace}lang")
if current_lang == "en":
print(f"Writing en text \"{elem.text}\" to file...")
elif current_lang == "de":
print(f"Writing de text \"{elem.text}\" to file...")
else:
print(f"Unable to determine language. Not writing \"{elem.text}\" to any file.")
if event == "end":
elem.clear()
打印输出
Writing en text "! no gaps and valves" to file...
Writing de text "! keine Spalten und Ventile" to file...
Writing en text "! no gaps and valves #2" to file...
Writing de text "! keine Spalten und Ventile #2" to file...