python xpath:比较日期

时间:2016-05-06 09:56:08

标签: python xpath lxml xml.etree

我有这个带有很多A元素的简化xml:

<root>      
    <A class="a" version="7">
      <details>
          <dates>
            <status date="2013-04-29T04:16:49.792-04:00">ACCEPTED</status>
            <status date="2013-08-12T04:08:23.773-04:00">ACCEPTED</status>
          </dates>
      </details>
    </A>
    <A class="a" version="7">
     ...
</root>

如何使用lxml xpath仅获取上次状态日期大于特定时间点的A个元素。

到目前为止我做了什么:

from lxml import etree
tree = etree.parse("./my.xml")
root = tree.getroot()
res = root.xpath("A[./details/dates/status[last()]/@date > '2013-08-12T00:00:0.000-04:00' ]");

但是这段代码的问题是比较由于某种原因总是 false ,所以res总是为空

感谢任何帮助或建议。

3 个答案:

答案 0 :(得分:2)

您需要翻译并比较为数字:

In [24]: x = """<root>
   ....:     <A class="a" version="7">
   ....:       <details>
   ....:           <dates>
   ....:             <status date="2013-04-29T04:16:49.792-04:00">ACCEPTED</status>
   ....:             <status date="2013-08-12T04:08:23.773-04:00">ACCEPTED</status>
   ....:           </dates>
   ....:       </details>
   ....:     </A>
   ....:     <A class="a" version="7">
   ....: </root>"""

In [25]: from lxml import html


In [26]: xml = html.fromstring(x)


In [27]: print(xml.xpath("a[translate(./details/dates/status[last()]/@date,'-:T.','') > '201308120000000000400']"))
[<Element a at 0x7fdb45bc8aa0>]

一旦你总是比较具有相同偏移量的日期并且你的日期是iso8601格式与yyyy-mm-dd格式相同的数字,你可以比较,所以比较是安全的,如果你有不同的偏移量或数字的数字,那么你将不得不作为日期时间对象进行比较。

答案 1 :(得分:1)

xpath 1.0中没有日期类型,您无法将xpath 1.0中的字符串与=!=以外的运算符进行比较。你有一个在python中支持xpath 2部分的软件包,但是我从来没有尝试过它(参见here)。这可能是一种方法。

答案 2 :(得分:1)

您可以使用datutil.parser

from lxml import etree
from datetime import datetime
from dateutil.parser import parse

a = '''<root>      
    <A class="a" version="7">
      <details>
          <dates>
            <status date="2013-04-29T04:16:49.792-04:00">ACCEPTED</status>
            <status date="2013-08-12T04:08:23.773-04:00">ACCEPTED</status>
          </dates>
      </details>
    </A>
    <A class="b" version="8">
      <details>
          <dates>
            <status date="2012-04-29T04:16:49.792-04:00">ACCEPTED</status>
            <status date="2012-08-12T04:08:23.773-04:00">ACCEPTED</status>
          </dates>
      </details>
    </A>
 </root> '''

tree = etree.fromstring(a)

# Set your begin time
beginTime = parse('2013-08-12T00:00:0.000-04:00')

# Loop through all A elements
for A in tree.findall('A'):
    # Get the last time of the A element
    timeA = A.find('./details/dates/status[last()]')   

    # Parse the found date into a datetime element
    date = parse(timeA.get('date'))

    # Compare the beginTime with the found date
    if beginTime < date:

        # Do as you like
        #print(date)