亲爱的,我正在尝试使用python version3解析xml文件中的一些数据。这是我的xml:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!-- Created on Fri Sep 07 08:20:37 WAT 2018 with ROAMSMART IREG-360 // www.roam-smart.com -->
<tadig-raex-21:TADIGRAEXIR21 xmlns:tadig-raex-21="https://infocentre.gsm.org/TADIG-RAEX-IR21" xmlns:ns2="https://infocentre.gsm.org/TADIG-GEN">
<tadig-raex-21:RAEXIR21FileHeader>
<tadig-raex-21:FileCreationTimestamp>2018-01-08T15:42:21+01:00</tadig-raex-21:FileCreationTimestamp>
<tadig-raex-21:FileType>IR.21</tadig-raex-21:FileType>
<tadig-raex-21:SenderTADIG>DEMO</tadig-raex-21:SenderTADIG>
<tadig-raex-21:PublishComment>Update</tadig-raex-21:PublishComment>
<tadig-raex-21:TADIGGenSchemaVersion>2.4</tadig-raex-21:TADIGGenSchemaVersion>
<tadig-raex-21:TADIGRAEXIR21SchemaVersion>10.1</tadig-raex-21:TADIGRAEXIR21SchemaVersion>
</tadig-raex-21:RAEXIR21FileHeader>
<tadig-raex-21:OrganisationInfo>
<tadig-raex-21:OrganisationName>DEMO</tadig-raex-21:OrganisationName>
<tadig-raex-21:CountryInitials>FRA</tadig-raex-21:CountryInitials>
<tadig-raex-21:NetworkList>
<tadig-raex-21:Network>
<tadig-raex-21:TADIGCode>DEMO</tadig-raex-21:TADIGCode>
<tadig-raex-21:NetworkType>Terrestrial</tadig-raex-21:NetworkType>
<tadig-raex-21:NetworkData>
<tadig-raex-21:IPRoaming_IW_InfoSection>
<tadig-raex-21:IPRoaming_IW_Info_General>
<tadig-raex-21:EffectiveDateOfChange>2013-07-01</tadig-raex-21:EffectiveDateOfChange>
<tadig-raex-21:PMNAuthoritativeDNSIPList>
<tadig-raex-21:DNSitem>
<tadig-raex-21:IPAddress>212.234.96.11</tadig-raex-21:IPAddress>
<tadig-raex-21:DNSname>PMASDNS1.mnc001.mcc208.gprs</tadig-raex-21:DNSname>
</tadig-raex-21:DNSitem>
<tadig-raex-21:DNSitem>
<tadig-raex-21:IPAddress>212.234.96.74</tadig-raex-21:IPAddress>
<tadig-raex-21:DNSname>LYLADNS1.mnc001.mcc208.gprs</tadig-raex-21:DNSname>
</tadig-raex-21:DNSitem>
<tadig-raex-21:DNSitem>
<tadig-raex-21:IPAddress>212.234.96.11</tadig-raex-21:IPAddress>
<tadig-raex-21:DNSname>PMASDNS1.mnc001.mcc208.3gppnetwork.org</tadig-raex-21:DNSname>
</tadig-raex-21:DNSitem>
<tadig-raex-21:DNSitem>
<tadig-raex-21:IPAddress>212.234.96.74</tadig-raex-21:IPAddress>
<tadig-raex-21:DNSname>LYLADNS1.mnc001.mcc208.3gppnetwork.org</tadig-raex-21:DNSname>
</tadig-raex-21:DNSitem>
</tadig-raex-21:PMNAuthoritativeDNSIPList>
</tadig-raex-21:IPRoaming_IW_Info_General>
</tadig-raex-21:IPRoaming_IW_InfoSection>
</tadig-raex-21:NetworkData>
<tadig-raex-21:HostedNetworksInfo>
<tadig-raex-21:SectionNA>Section not applicable</tadig-raex-21:SectionNA>
</tadig-raex-21:HostedNetworksInfo>
<tadig-raex-21:PresentationOfCountryInitialsAndMNN>DEMO FR</tadig-raex-21:PresentationOfCountryInitialsAndMNN>
<tadig-raex-21:AbbreviatedMNN>DEMO</tadig-raex-21:AbbreviatedMNN>
<tadig-raex-21:NetworkColourCode>1</tadig-raex-21:NetworkColourCode>
</tadig-raex-21:Network>
</tadig-raex-21:NetworkList>
</tadig-raex-21:OrganisationInfo>
</tadig-raex-21:TADIGRAEXIR21>
我需要从“所有DNS项”中获取所有IP地址,并将它们保存到将在csv文件中导出的列表中。 IP记录将在每一行中与TADIG关联。
我从此链接中得到启发(Getting all instances of child node using xml.etree.ElementTree),这是我的代码:
from xml.etree import ElementTree as ET
out = csv.writer(open("result.csv", "w"), delimiter=',', quoting=csv.QUOTE_ALL)
# loop through directory for and parse all xml file
directory = "C:\\Users\\Walid Ben Chamekh\\PycharmProjects\\dnsparser\\com\\ir21\\dnsparser\\"
# start parsing
print("Start parsing")
for filename in os.listdir(directory):
if filename.endswith(".xml"):
print(filename)
root = ET.parse(filename).getroot()
# get Network TADIG code
raexFileHeader = root.getchildren()[0]
tadig = raexFileHeader.getchildren()[2].text
try:
DNS = root.findall(
".//tadig-raex-21:OrganisationInfo/tadig-raex-21:NetworkList/tadig-raex-21:Network["
"1]/tadig-raex-21:NetworkData/tadig-raex-21:IPRoaming_IW_InfoSection/tadig-raex-21"
":IPRoaming_IW_Info_General/tadig-raex-21:PMNAuthoritativeDNSIPList")
except Exception:
print("no data")
continue
# get all IPs from all dns items
for item in DNS.getchildren():
IPresult = [tadig]
ip = item.getchildren()[0].text
IPresult.append(ip)
print(IPresult)
out.writerow(IPresult)
continue
else:
continue
# End Parsing
print("End Parsing")
它不起作用,DNS列表总是空的!!谢谢您的帮助
答案 0 :(得分:0)
问题在于ElementTree在名称空间方面不是很聪明。在对MorphTo
,find()
和findall()
的调用中,您需要传递一个包含命名空间的字典,该命名空间可在以下答案中找到:https://stackoverflow.com/a/14853417/2044940
iterfind()
通过此更改和其他一些更改,我得以使其返回以下数据:
namespaces = { "tadig-raex-21": "https://infocentre.gsm.org/TADIG-RAEX-IR21" }
root.findall("...", namespaces)
这是Python脚本。请注意,您需要使用输入XML为其提供一个['DEMO', '212.234.96.11']
['DEMO', '212.234.96.74']
['DEMO', '212.234.96.11']
['DEMO', '212.234.96.74']
:
filename
也可以不使用名称空间字典,但是完整的名称空间URI需要在花括号中用作前缀(找到here):
from xml.etree import ElementTree as ET
# Doesn't help, it is only used for serialization, i.e. writing XML, but not parsing
#ET.register_namespace("tadig-raex-21", "https://infocentre.gsm.org/TADIG-RAEX-IR21")
# Dictionary of namespaces, needed to avoid error:
# -> SyntaxError: prefix 'tadig-raex-21' not found in prefix map
namespaces = {
"tadig-raex-21": "https://infocentre.gsm.org/TADIG-RAEX-IR21"
}
root = ET.parse(filename).getroot()
# Fetch SenderTADIG by path
# TODO: handle case if the element doesn't exist
tadig = root.find(
"tadig-raex-21:RAEXIR21FileHeader/"
"tadig-raex-21:SenderTADIG", namespaces).text
# Select DNSitems for further processing
DNS = root.findall(
"tadig-raex-21:OrganisationInfo/"
"tadig-raex-21:NetworkList/"
"tadig-raex-21:Network[1]/"
"tadig-raex-21:NetworkData/"
"tadig-raex-21:IPRoaming_IW_InfoSection/"
"tadig-raex-21:IPRoaming_IW_Info_General/"
"tadig-raex-21:PMNAuthoritativeDNSIPList/"
"tadig-raex-21:DNSitem", namespaces)
# DNS is a list of elements, can't call getchildren() on it directly!
for item in DNS:
IPresult = [tadig]
# It's safer to fetch the IPAddress via the element name
ip = item.find("tadig-raex-21:IPAddress", namespaces).text
IPresult.append(ip)
print(IPresult)
有趣的是,似乎无法确定具有命名空间的根元素的属性(这可能使我们能够从中生成命名空间dict):
tadig = root.find(
"{https://infocentre.gsm.org/TADIG-RAEX-IR21}RAEXIR21FileHeader/"
"{https://infocentre.gsm.org/TADIG-RAEX-IR21}SenderTADIG").text
根元素包含名称空间信息:
# Empty dict
ET.parse(filename).getroot().attrib
您不能将名称空间命令传递给<tadig-raex-21:TADIGRAEXIR21
xmlns:tadig-raex-21="https://infocentre.gsm.org/TADIG-RAEX-IR21"
xmlns:ns2="https://infocentre.gsm.org/TADIG-GEN">
,因此不知道是否或如何获取属性getroot()
和xmlns:tadig-raex-21
的值。