如何通过R将数据从XML提取到数据集中

时间:2018-09-02 11:33:35

标签: r xml

这是我的源链接:source

 <AOSBS_XML Name="HR_ODDS_WIN" Timestamp="2018-09-02 17:59:16" Version="L2.2R1C" ID="355">
 <Meetings>
     ..........
  <Pools>
  <PoolInfo Pool="WIN" OddsUpdateTime="16:06" Enabled="0">
   <OddsSet>
    <OddsInfo Number="1" Odds="16" Scratched="0" OddsDrop="0.00" Hot="0" WillPay="16300"/>
    <OddsInfo Number="2" Odds="3.5" Scratched="0" OddsDrop="0.00" Hot="0" WillPay="3550"/>
    <OddsInfo Number="3" Odds="12" Scratched="0" OddsDrop="14.28" Hot="0" WillPay="12950"/>
    <OddsInfo Number="4" Odds="12" Scratched="0" OddsDrop="0.00" Hot="0" WillPay="12950"/>
    <OddsInfo Number="5" Odds="2.4" Scratched="0" OddsDrop="27.27" Hot="1" WillPay="2400"/>
    <OddsInfo Number="6" Odds="6.6" Scratched="0" OddsDrop="0.00" Hot="0" WillPay="6600"/>
    <OddsInfo Number="7" Odds="35" Scratched="0" OddsDrop="23.91" Hot="0" WillPay="35300"/>
    <OddsInfo Number="8" Odds="8.2" Scratched="0" OddsDrop="18.00" Hot="0" WillPay="8250"/>
   </OddsSet>
  </PoolInfo>
  </Pools>
     ...........
  </Meetings>
 </AOSBS_XML>

这是我的代码:

 url = paste("http://iosbsinfo02.hkjc.com/infoA/AOSBS/HR_GetInfo.ashx?QT=HR_ODDS_win&Venue=*&Race=7")
 doc = xmlParse(url)
 root = xmlRoot(doc)
 root

但是,我不知道如何将OddsSet的一部分提取到数据集中。有人可以帮助我吗?

2 个答案:

答案 0 :(得分:1)

这应该有效:

library(XML)
library(xml2)
library(purrr)
url = paste("http://iosbsinfo02.hkjc.com/infoA/AOSBS/HR_GetInfo.ashx?QT=HR_ODDS_win&Venue=*&Race=7")
doc = read_xml(url)
OddsSet <- xml_find_all(doc, ".//OddsSet") %>% 
  xml_children() %>% map(xml_attrs) %>% map_df(~as.list(.))

答案 1 :(得分:0)

对于以属性为中心的数据提取,请考虑使用三元冒号运算符可访问的XML内部方法xmlAttrsToDataFrame

library(XML)
...

df <- XML:::xmlAttrsToDataFrame(getNodeSet(doc, path='//OddsInfo'))