我正在从xml文件创建字典,如下所示:
for edge in root.findall('n:graph/n:edge', ns):
source = edge.get('source')
target = edge.get('target')
edges[(source, target)] = tuple([data.text for data in edge if \
(data.get('key') == keys[0] or data.get('key') == keys[1])])
哪个给我这样的输出:
{('4893468839','977369380'):('名称','长度')...}
当字段 name 的值为空时,是否可以使用默认文本'noName'?所有的键都有一个长度值,但不是所有的键都具有一个名称,所以我想避免这样的输出:
{('4893468839','977369380'):('长度',)...}
在这种情况下,要获得类似的结果:
{('4893468839','977369380'):('noName','length')...}
更多详细信息:
from lxml import etree
class graph():
_path = ""
def _readFile(self):
data = etree.parse(self._path)
root = data.getroot()
for edge in root.findall('n:graph/n:edge', ns):
source = edge.get('source')
target = edge.get('target')
edges[(source, target)] = tuple([data.text for data in edge if data.get('key') in keys[:2]])
给出如下所示的xml:
<key attr.name="ref" attr.type="string" for="edge" id="d14" />
<key attr.name="name" attr.type="string" for="edge" id="d13" />
<key attr.name="geometry" attr.type="string" for="edge" id="d12" />
<key attr.name="length" attr.type="string" for="edge" id="d11" />
<key attr.name="oneway" attr.type="string" for="edge" id="d10" />
<key attr.name="highway" attr.type="string" for="edge" id="d9" />
<key attr.name="bridge" attr.type="string" for="edge" id="d8" />
<key attr.name="osmid" attr.type="string" for="edge" id="d7" />
<edge id="0" source="4331489627" target="4331489577">
<data key="d7">435211336</data>
<data key="d13">Calle Carretera</data>
<data key="d9">residential</data>
<data key="d10">False</data>
<data key="d11">52.45</data>
<data key="d12">LINESTRING (-4.8413613 39.4799045, -4.8414814 39.4798489, -4.8419449 39.4797838)</data>
</edge>
可以正常生成以下输出:
{('4331489627','4331489577'):(('Calle Carretera','52.45'))
但是例如存在一些边缘,像这样错误地混淆了名称或d13键标签:
<edge id="0" source="982621562" target="946409159">
<data key="d7">483537143</data>
<data key="d14">CM-4106</data>
<data key="d9">secondary</data>
<data key="d10">False</data>
<data key="d11">104.66499999999999</data>
<data key="d12">LINESTRING (-4.8366071 39.4783468, -4.8368979 39.4789602, -4.8371678 39.4791592)</data>
</edge>
在这种情况下,由于未找到标签文本,因此我得到了此输出:
{('982621562','946409159'):('52.45',)}
如果可能的话,希望得到类似的东西:
{('982621562','946409159'):(('noName','52.45'))
答案 0 :(得分:1)
基于上述内容,我整理了一个实际可行的示例:
from lxml import etree
root = etree.fromstring("""
<xml><graph>
<key attr.name="ref" attr.type="string" for="edge" id="d14" />
<key attr.name="name" attr.type="string" for="edge" id="d13" />
<key attr.name="geometry" attr.type="string" for="edge" id="d12" />
<key attr.name="length" attr.type="string" for="edge" id="d11" />
<key attr.name="oneway" attr.type="string" for="edge" id="d10" />
<key attr.name="highway" attr.type="string" for="edge" id="d9" />
<key attr.name="bridge" attr.type="string" for="edge" id="d8" />
<key attr.name="osmid" attr.type="string" for="edge" id="d7" />
<edge id="0" source="4331489627" target="4331489577">
<data key="d7">435211336</data>
<data key="d13">Calle Carretera</data>
<data key="d9">residential</data>
<data key="d10">False</data>
<data key="d11">52.45</data>
<data key="d12">LINESTRING (-4.8413613 39.4799045, -4.8414814 39.4798489, -4.8419449 39.4797838)</data>
</edge>
<edge id="0" source="982621562" target="946409159">
<data key="d7">483537143</data>
<data key="d14">CM-4106</data>
<data key="d9">secondary</data>
<data key="d10">False</data>
<data key="d11">104.66499999999999</data>
<data key="d12">LINESTRING (-4.8366071 39.4783468, -4.8368979 39.4789602, -4.8371678 39.4791592)</data>
</edge>
</graph></xml>
""")
keys = {}
for key in root.findall('graph/key'):
keys[key.get('attr.name')] = key.get('id')
key_name = keys['name']
key_length = keys['length']
out = {}
for edge in root.findall('graph/edge'):
data = dict((d.get('key'), d.text) for d in edge.findall('data'))
value = (data.get(key_name, 'noName'), data[key_length])
out[(edge.get('source'), edge.get('target'))] = value
print(out)
请注意,您现在获得第二个边缘的None
。在“丢失”之前,是因为您要告诉它被过滤掉。相反,我的代码基于xml创建了一个字典,然后总是用包含两个元素的元组填充out
中的值。