当存在多个匹配时,Xpath将拉出最大值

时间:2016-11-07 21:03:02

标签: xml xpath hive hiveql hdinsight

我正在从xml创建一个hive外部表。我想拉出时间戳最大的元素的值。我如何在Create Table Statement中写这个?

我的XML:

 <Parent>
    <Child>
        <Purchase value ="100" id ="350" timestamp="2016-10-08T14:22:31.0000000">
    </Child>
    <Child>
        <Purchase value ="110" id ="350" timestamp="2016-10-08T14:22:32.0000000">
    </Child>
    <Child>
        <Purchase value ="105" id ="350" timestamp="2016-10-09T14:22:32.0000000">
    </Child>
    <Child>
        <Purchase value ="75" id ="350" timestamp="2016-10-10T14:22:32.0000000">
    </Child>
</Parent>

以下查询给出了所有4种价格。但我只想要最近TimeStamp的价格?在Hive怎么办?

CREATE EXTERNAL TABLE Recommended_StagingTable (

 ItemPrice INT
 )
 ROW FORMAT SERDE 
  'com.ibm.spss.hive.serde2.xml.XmlSerDe' 
WITH SERDEPROPERTIES ( 
  "column.xpath.id" ="/Parent/Child/Purchase[@id='350']/@value"
  )

1 个答案:

答案 0 :(得分:0)

将purchase_timestamp列添加到Recommended_StagingTable,然后使用sql row_number分析功能按时间戳查找最新内容:

select ItemPrice 
  from 
      (
      select 
            ItemPrice ,
            purchase_timestamp,
            row_number() over(order by purchase_timestamp desc ) rn
                              --add partition by if necessary 
        from Recommended_StagingTable
      )s
 where rn = 1; --the latest by timestamp