Question

我有下一个xml：

<Content>
<article title="I Compute, Therefore I am" id="a1">
        <authors>
            <author>Philbert von Cookie</author>
            <author>Alice Brockman</author>
            <author>Pedro Smith</author>
        </authors>
        <journal>
            <name>Journal of Computational Metaphysics</name>
            <volume>3</volume>
            <issue>7</issue>
            <published>04/11/2006</published>
            <pages start="42" end="49"/>
        </journal>
</article>
...
</Content>

根元素中有很多类似的文章节点 - ＆gt;含量

我已将我的xml解析为python代码并希望获得最大日期值。这是我的python代码：

try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET

tree = ET.ElementTree(file='data.xml')
root = tree.getroot()
root.tag, root.attrib

我试图使用iterfind（）获取它，但到目前为止这不起作用。

for elem in tree.iterfind('(/*/*/journal/published/value[not(text() < preceding-sibling::value/text()) and not(text() < following-sibling::value/text())])[1]'):
 print (elem.text)

你能帮我解答如何为iterfind（）设置我的XPATH，或者可能还有其他方法吗？谢谢。

Answer 1

xml.etree.ElementTree仅提供limited xpath support。

一种替代选择是将所有日期解析为列表并获得最大值：

from datetime import datetime

dates = [published.text for published in root.iterfind('.//article/journal/published')]
print max(dates, key=lambda x: datetime.strptime(x, '%d/%m/%Y'))

请注意，为了在这种情况下找到最大值，您应该比较datetime值，而不是字符串（这是key函数帮助的地方。）

此外，如果您想获得与最大日期journal记录相对应的内容，您可以构建字典映射＆＃34; date - ＆gt;轴颈＆＃34;然后获得适当的期刊记录：

from datetime import datetime
import operator

try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET

tree = ET.ElementTree(file='data.xml')
root = tree.getroot()

mapping = {datetime.strptime(journal.findtext('published'), '%d/%m/%Y'): journal 
           for journal in root.iterfind('.//article/journal')}

journal_latest = max(mapping.iteritems(), key=operator.itemgetter(0))[1]
print journal_latest.findtext('name')

如何通过xpath获取最新日期？

1 个答案: