将xml DataTable转换为R Dataframe

时间:2015-11-13 19:48:19

标签: xml r soap

我从SOAP API获取XML DataTable(这就是他们所谓的结构),如下所示:

<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   <soap:Body>
      <UniversalFieldGroupingResponse xmlns="http://rixtrema.net/">
         <UniversalFieldGroupingResult>
            <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" id="NewDataSet">
               <xs:element name="NewDataSet" msdata:IsDataSet="true" msdata:MainDataTable="RESULT" msdata:Locale="en-US">
                  <xs:complexType>
                     <xs:choice minOccurs="0" maxOccurs="unbounded">
                        <xs:element name="RESULT" msdata:CaseSensitive="False" msdata:Locale="en-US">
                           <xs:complexType>
                              <xs:sequence>
                                 <xs:element name="Group" type="xs:string" minOccurs="0" />
                                 <xs:element name="Value" type="xs:double" minOccurs="0" />
                                 <xs:element name="Name" type="xs:string" minOccurs="0" />
                                 <xs:element name="ID" type="xs:double" minOccurs="0" />
                              </xs:sequence>
                           </xs:complexType>
                        </xs:element>
                     </xs:choice>
                  </xs:complexType>
               </xs:element>
            </xs:schema>
            <diffgr:diffgram xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
               <DocumentElement xmlns="">
                  <RESULT diffgr:id="RESULT1" msdata:rowOrder="0" diffgr:hasChanges="inserted">
                     <Group>GLOBAL FUNDS</Group>
                     <Value>43.909752322865359</Value>
                     <Name>DIREXION DLY JR GOLD BEAR 3X</Name>
                     <ID>2</ID>
                  </RESULT>
                  <RESULT diffgr:id="RESULT2" msdata:rowOrder="1" diffgr:hasChanges="inserted">
                     <Group>GLOBAL FUNDS</Group>
                     <Value>49.355249530959405</Value>
                     <Name>DIRXN DAILY JR BULL GOLD 3X</Name>
                     <ID>3</ID>
                  </RESULT>
                  <RESULT diffgr:id="RESULT3" msdata:rowOrder="2" diffgr:hasChanges="inserted">
                     <Group>GLOBAL FUNDS</Group>
                     <Value>25.683552161722936</Value>
                     <Name>Direxion Daily Small Cap Bull 3X Shares</Name>
                     <ID>4</ID>
                  </RESULT>
                  <RESULT diffgr:id="RESULT4" msdata:rowOrder="3" diffgr:hasChanges="inserted">
                     <Group>GLOBAL FUNDS</Group>
                     <Value>38.662180870630991</Value>
                     <Name>Direxion Daily Gold Miners Bear 3X Shrs</Name>
                     <ID>5</ID>
                  </RESULT>
                  <RESULT diffgr:id="RESULT5" msdata:rowOrder="4" diffgr:hasChanges="inserted">
                     <Group>GLOBAL FUNDS</Group>
                     <Value>28.857511273261132</Value>
                     <Name>Direxion Daily Small Cap Bear 3X Shares</Name>
                     <ID>6</ID>
                  </RESULT>
                  ...
               </DocumentElement>
            </diffgr:diffgram>
         </UniversalFieldGroupingResult>
      </UniversalFieldGroupingResponse>
   </soap:Body>
</soap:Envelope>

我对XML一无所知,但我正在尝试使用XML包,特别是xmlToList()或xmlTODataFrame()。我可以使用list函数来最终获得我想要的结构,但这很痛苦。我无法让xmlToDataFrame做任何有用的事情。 (我得到的只是一个包含所有数据的字符串。)这是工作的xmlToList()方法:

theResult2 <- xmlToList(theResult$value())
theResult2 <- (theResult2$Body$UniversalFieldGroupingResponse$UniversalFieldGroupingResul$diffgram$DocumentElement)
test <- as.data.frame(unname(t(sapply(theResult2, function(x) (unlist(x)[c(3,1,2)])))))

这提供了我可以使用的东西:

> test
                                        V1            V2                 V3
1             DIREXION DLY JR GOLD BEAR 3X  GLOBAL FUNDS 43.909752322865359
2              DIRXN DAILY JR BULL GOLD 3X  GLOBAL FUNDS 49.355249530959405
3  Direxion Daily Small Cap Bull 3X Shares  GLOBAL FUNDS 25.683552161722936
4  Direxion Daily Gold Miners Bear 3X Shrs  GLOBAL FUNDS 38.662180870630991
5  Direxion Daily Small Cap Bear 3X Shares  GLOBAL FUNDS 28.857511273261132
6  Direxion Daily Financial Bear 3X Shares  GLOBAL FUNDS 27.286991074091898

有人有更优雅的方式来做这件事吗?还是一些建议?

1 个答案:

答案 0 :(得分:1)

主要感谢@lukeA,我得到了以下工作:

> tester <- xmlParse(theResult$value())
> dd = xmlToDataFrame(getNodeSet(tester, "//RESULT"))
> dd
           Group              Value                                    Name ID
1   GLOBAL FUNDS 43.909752322865359            DIREXION DLY JR GOLD BEAR 3X  2
2   GLOBAL FUNDS 49.355249530959405             DIRXN DAILY JR BULL GOLD 3X  3
3   GLOBAL FUNDS 25.683552161722936 Direxion Daily Small Cap Bull 3X Shares  4
4   GLOBAL FUNDS 38.662180870630991 Direxion Daily Gold Miners Bear 3X Shrs  5
5   GLOBAL FUNDS 28.857511273261132 Direxion Daily Small Cap Bear 3X Shares  

我很好,我很感激!