我有一个10 GB的xml文件,其中包含不同块的列表。这是我的文件片段:
<?xml version="1.0" encoding="UTF-8"?>
<?import java.lang.*?>
<?import javafx.scene.control.*?>
<?import javafx.scene.layout.*?>
<?import javafx.scene.layout.VBox?>
<GridPane xmlns:fx="http://javafx.com/fxml/1" xmlns="http://javafx.com/javafx/2.2" fx:controller="application.Main">
<children>
<GridPane>
<children>
<GridPane>
<children>
<TextArea editable="false" mouseTransparent="false" prefWidth="200.0" text="Monday" wrapText="true" GridPane.columnIndex="1" GridPane.rowIndex="0" />
<TextArea editable="false" mouseTransparent="false" prefWidth="200.0" text="Tuesday" wrapText="true" GridPane.columnIndex="2" GridPane.rowIndex="0" />
<TextArea editable="false" mouseTransparent="false" prefWidth="200.0" text="Wednesday" wrapText="true" GridPane.columnIndex="3" GridPane.rowIndex="0" />
<TextArea editable="false" mouseTransparent="false" prefWidth="200.0" text="Thursday" wrapText="true" GridPane.columnIndex="4" GridPane.rowIndex="0" />
<TextArea editable="false" mouseTransparent="false" prefWidth="200.0" text="Friday" wrapText="true" GridPane.columnIndex="5" GridPane.rowIndex="0" />
<TextArea editable="false" mouseTransparent="false" prefWidth="200.0" text="Saturday" wrapText="true" GridPane.columnIndex="6" GridPane.rowIndex="0" />
<TextArea editable="false" mouseTransparent="false" prefWidth="200.0" text="Sunday" wrapText="true" GridPane.columnIndex="7" GridPane.rowIndex="0" />
<TextArea editable="false" mouseTransparent="false" prefWidth="200.0" text="Monday" wrapText="true" GridPane.columnIndex="1" GridPane.rowIndex="9" />
<TextArea editable="false" mouseTransparent="false" prefWidth="200.0" text="Tuesday" wrapText="true" GridPane.columnIndex="2" GridPane.rowIndex="9" />
<TextArea editable="false" mouseTransparent="false" prefWidth="200.0" text="Wednesday" wrapText="true" GridPane.columnIndex="3" GridPane.rowIndex="9" />
<TextArea editable="false" mouseTransparent="false" prefWidth="200.0" text="Thursday" wrapText="true" GridPane.columnIndex="4" GridPane.rowIndex="9" />
<TextArea editable="false" mouseTransparent="false" prefWidth="200.0" text="Friday" wrapText="true" GridPane.columnIndex="5" GridPane.rowIndex="9" />
<TextArea editable="false" mouseTransparent="false" prefWidth="200.0" text="Saturday" wrapText="true" GridPane.columnIndex="6" GridPane.rowIndex="9" />
<TextArea editable="false" mouseTransparent="false" prefWidth="200.0" text="Sunday" wrapText="true" GridPane.columnIndex="7" GridPane.rowIndex="9" />
<Button id="prev" fx:id="prev2" mnemonicParsing="false" onAction="#ClickMinus" prefHeight="30.0" prefWidth="70.0" text="prev" GridPane.columnIndex="8" GridPane.rowIndex="0" />
<Button fx:id="next" mnemonicParsing="false" onAction="#ClickPlus" prefHeight="29.999900000002526" prefWidth="70.00009999999747" text="next" GridPane.columnIndex="0" GridPane.rowIndex="9" />
<Button fx:id="next2" mnemonicParsing="false" onAction="#ClickPlus" prefHeight="30.0" prefWidth="70.0" text="next" GridPane.columnIndex="8" GridPane.rowIndex="9" />
<Button fx:id="prev" mnemonicParsing="false" onAction="#ClickMinus" prefHeight="30.0" prefWidth="70.0" text="prev" GridPane.columnIndex="0" GridPane.rowIndex="0" />
<TextArea fx:id="week1" prefWidth="200.0" text="Week x Year x" wrapText="true" GridPane.columnIndex="0" GridPane.rowIndex="2" />
<TextArea fx:id="week2" prefWidth="200.0" text="Week x Year x" wrapText="true" GridPane.columnIndex="0" GridPane.rowIndex="4" />
<TextArea fx:id="week4" prefWidth="200.0" text="Week x Year x" wrapText="true" GridPane.columnIndex="0" GridPane.rowIndex="8" />
<Label fx:id="lab11" onMouseClicked="#labClick" prefHeight="40.0" prefWidth="150.0" text="" GridPane.columnIndex="1" GridPane.rowIndex="1" />
<Label fx:id="lab12" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="2" GridPane.rowIndex="1" />
<Label fx:id="lab13" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="3" GridPane.rowIndex="1" />
<Label fx:id="lab14" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="4" GridPane.rowIndex="1" />
<Label fx:id="lab15" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="5" GridPane.rowIndex="1" />
<Label fx:id="lab21" minHeight="13.0" onMouseClicked="#labClick" prefHeight="40.0" prefWidth="149.9998779296875" text="Label" GridPane.columnIndex="1" GridPane.rowIndex="3" />
<Label fx:id="lab22" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="2" GridPane.rowIndex="3" />
<Label fx:id="lab23" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="3" GridPane.rowIndex="3" />
<Label fx:id="lab32" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="2" GridPane.rowIndex="5" />
<Label fx:id="lab31" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="1" GridPane.rowIndex="5" />
<Label fx:id="lab33" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="3" GridPane.rowIndex="5" />
<Label fx:id="lab34" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="4" GridPane.rowIndex="5" />
<Label fx:id="lab24" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="4" GridPane.rowIndex="3" />
<Label fx:id="lab25" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="5" GridPane.rowIndex="3" />
<Label fx:id="lab35" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="5" GridPane.rowIndex="5" />
<Label fx:id="lab41" onMouseClicked="#labClick" prefHeight="40.000099999997474" prefWidth="150.0" text="Label" GridPane.columnIndex="1" GridPane.rowIndex="7" />
<Label fx:id="lab42" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="2" GridPane.rowIndex="7" />
<Label fx:id="lab16" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="6" GridPane.rowIndex="1" />
<Label fx:id="lab17" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="7" GridPane.rowIndex="1" />
<Label fx:id="lab26" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="6" GridPane.rowIndex="3" />
<Label fx:id="lab43" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="3" GridPane.rowIndex="7" />
<Label fx:id="lab44" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="4" GridPane.rowIndex="7" />
<Label fx:id="lab45" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="5" GridPane.rowIndex="7" />
<Label fx:id="lab36" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="6" GridPane.rowIndex="5" />
<Label fx:id="lab46" onMouseClicked="#labClick" prefHeight="39.9998779296875" prefWidth="150.0" text="Label" GridPane.columnIndex="6" GridPane.rowIndex="7" />
<Label fx:id="lab47" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="7" GridPane.rowIndex="7" />
<Label fx:id="lab37" onMouseClicked="#labClick" prefHeight="40.000099999997474" prefWidth="139.0" text="Label" GridPane.columnIndex="7" GridPane.rowIndex="5" />
<Label fx:id="lab27" onMouseClicked="#labClick" prefHeight="44.0" prefWidth="139.0" text="Label" GridPane.columnIndex="7" GridPane.rowIndex="3" />
<TextArea fx:id="week3" prefHeight="100.00009999999747" prefWidth="70.0" text="Week x Year x" wrapText="true" GridPane.columnIndex="0" GridPane.rowIndex="6" />
<Button fx:id="start" mnemonicParsing="false" onAction="#ClickStart" prefHeight="30.0" prefWidth="70.0" text="Start" GridPane.columnIndex="0" GridPane.rowIndex="1" />
<VBox fx:id="vb11" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="1" GridPane.rowIndex="2" />
<VBox fx:id="vb12" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="2" GridPane.rowIndex="2" />
<VBox id="vb12" fx:id="vb13" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="3" GridPane.rowIndex="2" />
<VBox id="vb12" fx:id="vb21" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="1" GridPane.rowIndex="4" />
<VBox id="vb12" fx:id="vb22" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="2" GridPane.rowIndex="4" />
<VBox id="vb12" fx:id="vb23" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="3" GridPane.rowIndex="4" />
<VBox id="vb12" fx:id="vb25" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="5" GridPane.rowIndex="4" />
<VBox id="vb12" fx:id="vb31" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="1" GridPane.rowIndex="6" />
<VBox id="vb12" fx:id="vb32" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="2" GridPane.rowIndex="6" />
<VBox id="vb12" fx:id="vb14" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="4" GridPane.rowIndex="2" />
<VBox id="vb12" fx:id="vb15" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="5" GridPane.rowIndex="2" />
<VBox id="vb12" fx:id="vb35" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="5" GridPane.rowIndex="6" />
<VBox id="vb12" fx:id="vb33" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="3" GridPane.rowIndex="6" />
<VBox id="vb12" fx:id="vb41" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="1" GridPane.rowIndex="8" />
<VBox id="vb12" fx:id="vb42" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="2" GridPane.rowIndex="8" />
<VBox id="vb12" fx:id="vb43" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="3" GridPane.rowIndex="8" />
<VBox id="vb12" fx:id="vb44" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="4" GridPane.rowIndex="8" />
<VBox id="vb12" fx:id="vb45" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="5" GridPane.rowIndex="8" />
<VBox id="vb12" fx:id="vb24" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="4" GridPane.rowIndex="4" />
<VBox id="vb12" fx:id="vb26" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="6" GridPane.rowIndex="4" />
<VBox id="vb12" fx:id="vb36" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="6" GridPane.rowIndex="6" />
<VBox id="vb12" fx:id="vb16" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="6" GridPane.rowIndex="2" />
<VBox id="vb12" fx:id="vb17" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="7" GridPane.rowIndex="2" />
<VBox id="vb12" fx:id="vb27" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="7" GridPane.rowIndex="4" />
<VBox id="vb12" fx:id="vb37" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="7" GridPane.rowIndex="6" />
<VBox id="vb12" fx:id="vb34" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="4" GridPane.rowIndex="6" />
<VBox id="vb12" fx:id="vb46" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="6" GridPane.rowIndex="8" />
<VBox id="vb12" fx:id="vb47" prefHeight="200.0" prefWidth="100.0" GridPane.columnIndex="7" GridPane.rowIndex="8" />
</children>
<columnConstraints>
<ColumnConstraints hgrow="SOMETIMES" minWidth="10.0" prefWidth="70.0" />
<ColumnConstraints hgrow="SOMETIMES" minWidth="10.0" prefWidth="150.0" />
<ColumnConstraints hgrow="SOMETIMES" minWidth="10.0" prefWidth="150.0" />
<ColumnConstraints hgrow="SOMETIMES" minWidth="10.0" prefWidth="150.0" />
<ColumnConstraints hgrow="SOMETIMES" minWidth="10.0" prefWidth="150.0" />
<ColumnConstraints hgrow="SOMETIMES" minWidth="10.0" prefWidth="150.0" />
<ColumnConstraints hgrow="SOMETIMES" minWidth="10.0" prefWidth="150.0" />
<ColumnConstraints hgrow="SOMETIMES" minWidth="10.0" prefWidth="150.0" />
<ColumnConstraints hgrow="SOMETIMES" minWidth="10.0" prefWidth="70.0" />
</columnConstraints>
<rowConstraints>
<RowConstraints maxHeight="40.0" minHeight="10.0" prefHeight="40.0" vgrow="SOMETIMES" />
<RowConstraints maxHeight="40.0" minHeight="10.0" prefHeight="40.0" vgrow="SOMETIMES" />
<RowConstraints maxHeight="100.0" minHeight="10.0" prefHeight="100.0" vgrow="SOMETIMES" />
<RowConstraints maxHeight="40.0" minHeight="10.0" prefHeight="40.0" vgrow="SOMETIMES" />
<RowConstraints maxHeight="100.0" minHeight="10.0" prefHeight="100.0" vgrow="SOMETIMES" />
<RowConstraints maxHeight="40.0" minHeight="10.0" prefHeight="40.0" vgrow="SOMETIMES" />
<RowConstraints maxHeight="100.0" minHeight="10.0" prefHeight="100.0" vgrow="SOMETIMES" />
<RowConstraints maxHeight="40.0" minHeight="10.0" prefHeight="40.0" vgrow="SOMETIMES" />
<RowConstraints maxHeight="100.0" minHeight="10.0" prefHeight="100.0" vgrow="SOMETIMES" />
<RowConstraints maxHeight="40.0" minHeight="10.0" prefHeight="40.0" vgrow="SOMETIMES" />
</rowConstraints>
</GridPane>
</children>
<columnConstraints>
<ColumnConstraints hgrow="SOMETIMES" minWidth="10.0" />
</columnConstraints>
<rowConstraints>
<RowConstraints minHeight="10.0" vgrow="SOMETIMES" />
</rowConstraints>
</GridPane>
</children>
<columnConstraints>
<ColumnConstraints hgrow="SOMETIMES" minWidth="10.0" />
</columnConstraints>
<rowConstraints>
<RowConstraints minHeight="10.0" vgrow="SOMETIMES" />
</rowConstraints>
</GridPane>
所以我的目标是使用 iterparse celementtree 以序列化模式解析我的文件,但希望一次获取每个块。例如,我喜欢获取 image 的整个块,然后解析该块内的值。
例如,我需要获取第一个图像块(<image>
<ref>www.test.com</ref>
<label/>
<number>0</number>
<ID>ID0</ID>
<name>test1</name>
<comment>
<line number="0">This is a comment</line>
<line number="1">This is also another comment</line>
</comment>
<creationDate>2017-02-13T15:46:16-04:00</creationDate>
</image>
<result>
<ref>www.test1.com</ref>
<label/>
<number>001</number>
<ID>RE1</ID>
<name>test2</name>
<comment>
<line number="0">This is a comment2</line>
</comment>
<creationDate>2017-01-13T15:46:16-04:00</creationDate>
</result>
<image>
<ref>www.test3.com</ref>
<label/>
<number>1</number>
<ID>ID1</ID>
<value>10030</value>
<name>test3</name>
<comment>
<line number="0">This is a comment3</line>
</comment>
<creationDate>2017-04-13T15:46:16-04:00</creationDate>
</image>
)块然后打印其中的值 www.test.com,0,id0,test1,这是一个注释和2017-02- 13T15:46:16-04:00
所以我使用了以下代码,但似乎它只是逐行读取xml文件,也无法打印每行或元素中的值:
*<image>... </image>*
你能帮我解决这个问题。我是xml解析的新手。 我还想将每个解析的块转换为python中的字典。有可能吗?
答案 0 :(得分:0)
它不是“逐行”读取XML文件。 在每个元素的末尾返回end
事件。也就是说,如果您的输入文件如下所示:
<data>
<widgets location="earth">
<widget name="gizmo"/>
<widget name="gadget"/>
<widget name="thingamajig"/>
</widgets>
</data>
从简单调用到iterparse
的返回值序列为:
end <Element widget at 0x7f31e3132488>
end <Element widget at 0x7f31e3123f38>
end <Element widget at 0x7f31e3123ef0>
end <Element widgets at 0x7f31e31327a0>
end <Element data at 0x7f31e31324d0>
如果您愿意,还可以在每个元素的开头上receive start
events,如下所示:
for event, element in etree.iterparse(fd, events=('start', 'end')):
print event, element
输出为:
start <Element data at 0x7fccf78cc518>
start <Element widgets at 0x7fccf78cc7e8>
start <Element widget at 0x7fccf78cc4d0>
end <Element widget at 0x7fccf78cc4d0>
start <Element widget at 0x7fccf78bdf80>
end <Element widget at 0x7fccf78bdf80>
start <Element widget at 0x7fccf78bdf38>
end <Element widget at 0x7fccf78bdf38>
end <Element widgets at 0x7fccf78cc7e8>
end <Element data at 0x7fccf78cc518>
如果我想为每个widgets
构建location
列表,那么我可能希望通过初始化列表,然后附加每个新窗口小部件来响应start
事件到达那个列表,直到我到达结束元素,如:
from lxml import etree
with open('data2.xml') as fd:
widgets = {}
loc = None
for event, element in etree.iterparse(fd, events=('start', 'end')):
if event == 'start' and element.tag == 'widgets':
loc = element.get('location')
widgets[loc] = []
elif event == 'end' and element.tag == 'widget':
widgets[loc].append(element.get('name'))
print widgets
其输出为:
{'earth': ['gizmo', 'gadget', 'thingamajig']}
我希望这能让您了解如何处理输入文件中的每个感兴趣的块。