使用Python提取元数据XML文件,从Python脚本填充各种元素,然后将XML文件保存回其源。我试图在其中创建一个名为“ citeinfo”的子元素,其中一个子元素称为“ pubdate”,另一个子元素称为“ othercit”。运行脚本时没有出现任何错误,但是当我打开XML后处理时,我得到了第二个用于“引文”的元素组,它是“ citeinfo”的父级,并且所有行中的一行我的新元素。这是我的Python:
import arcpy, sys
from xml.etree.ElementTree import ElementTree
from xml.etree.ElementTree import Element, SubElement
import xml.etree.ElementTree as ET
from arcpy import env
env.overwriteOutput = True
fcpath = r"...\HL Metadata to BR Sample Data.gdb\NCI20102014_Oral"
translatorpath = r"...\Translator\ARCGIS2FGDC.xml"
xmlfile = r"...\Extras\FullMetaFC.xml"
arcpy.ExportMetadata_conversion(fcpath, translatorpath, xmlfile)
tree = ElementTree()
tree.parse(xmlfile)
a = tree.find('idinfo')
aa = tree.find('metainfo')
aaa = tree.find('eainfo')
b = ET.SubElement(a, 'citation')
c = ET.SubElement(b, 'citeinfo')
bb = ET.SubElement(c, 'pubdate')
d = ET.SubElement(c, 'othercit')
e = ET.SubElement(a, 'descript')
f = ET.SubElement(e, 'abstract')
g = ET.SubElement(e, 'purpose')
title = tree.find("idinfo/citation/citeinfo/title")
public_date = tree.find("idinfo/citation/citeinfo/pubdate")
cit_source = tree.find("idinfo/citation/citeinfo/othercit")
abstract = tree.find("idinfo/descript/abstract")
purpose = tree.find("idinfo/descript/purpose")
title.text = "Oral Cancer Incidence by County"
bb.text = "99990088"
d.text = "https://statecancerprofiles.cancer.gov/"
abstract.text = "Incidence rates are..."
purpose.text = "The State Cancer Profiles..."
tree.write(xmlfile)
arcpy.ImportMetadata_conversion(xmlfile, "FROM_FGDC", fcpath, "ENABLED")
这是XML:
<citation>
<citeinfo>
<origin>X</origin>
<title>META_TESTING</title>
<geoform>vector digital data</geoform>
<pubdate>20102010</pubdate><othercit>www.google.com</othercit></citeinfo>
</citation>
我希望“引文”组看起来像这样:
<citation>
<citeinfo>
<title>National Cancer Institute, Oral Cancer Incidence by County</title>
<geoform>vector digital data</geoform>
<pubdate>20120510</pubdate>
<othercit>www.google.com</othercit>
</citeinfo>
</citation>
答案 0 :(得分:1)
我将创建一个小的辅助函数,以确保元素的存在。如果存在,则返回它-如果不存在,则创建它。
def ensure_elem(context, name):
elem = context.find(name)
return ET.SubElement(context, name) if elem is None else elem
现在您可以这样做:
tree = ET.parse(xmlfile)
# ensure we have a /metadata/idinfo/citation/citeinfo hierarchy
metadata = tree.getroot()
idinfo = ensure_elem(metadata, "idinfo")
citation = ensure_elem(idinfo, "citation")
citeinfo = ensure_elem(citation, "citeinfo")
# update the text of elements beneath citeinfo
ensure_elem(citeinfo, 'pubdate').text = "new pubdate"
ensure_elem(citeinfo, 'title').text = "new title"
# ...and so on
tree.write(xmlfile)
请注意,您可以在一行代码中ET.parse()
一个文件。
为简便起见,可以这样做:
e = ensure_elem
# ensure we have a /metadata/idinfo/citation/citeinfo hierarchy
citeinfo = e(e(e(tree.getroot(), "idinfo"), "citation"), "citeinfo")
要漂亮地打印ElementTree文档,可以使用此功能:
def indent(tree, indent_by=' '):
irrelevant = lambda s: s is None or s.lstrip('\r\n\t\v ') == ''
indent_str = lambda i: '\n' + indent_by * i
def indent(elem, level=0, last_child=True):
if len(elem) and irrelevant(elem.text):
elem.text = indent_str(level+1)
if irrelevant(elem.tail):
elem.tail = indent_str(level-(1 if last_child else 0))
for i, child in enumerate(elem, 1):
indent(child, level+1, i==len(elem))
indent(tree.getroot())