在R中解析XML文件

时间:2018-05-31 01:56:35

标签: r xml

我有一个XML文件,我想用R解析。我运行下面的代码将其解析为数据框并获取下面的输出。在数据框中,我无法获得datetime =“2016-12-15T22:45:40.000Z”。我能够获得 在数据帧中累积操作小时1059.64。我想将XML文档中的日期时间字段解析为数据帧。关于如何做的任何想法?

     xmldataframe <- xmlToDataFrame("xamal.xml")
     xmlfile <- xmlParse("xamal.xml")
     rootnode <- xmlRoot(xmlfile)
     rootsize <- xmlSize(rootnode)

     print(rootsize)
     [1] 103
     print(rootnode[[11]][[5]])
    <CumulativeOperatingHours datetime="2016-12-15T22:45:40.000Z">
    <Hour>1059.60</Hour>
    </CumulativeOperatingHours>

下面是我试图读入R的XML文件。这是一个长文件,所以我需要将其作为文件读入R并在R中创建一个包含属性日期和时间的数据框

<?xml version="1.0" encoding="UTF-8"?>
<Group xmlns="http://standards.is.com/is/151/-1" version="2" Time="2018-05-30T19:33:44.352Z">
   <Links>
      <rel>self</rel>
      <href>https://cloud.com/1</href>
   </Links>
   <Links>
      <rel>last</rel>
      <href>https://cloud.com/2</href>
   </Links>
   <Links>
      <rel>next</rel>
      <href>https://cloud.com/3</href>
   </Links>
   <Equip>
      <EquipHead>
         <Name>CAST</Name>
         <Model>1100</Model>
         <EquipmentID>Desk</EquipmentID>
         <SerialNumber>12312312</SerialNumber>
         <PIN>123123</PIN>
      </EquipHead>
      <Location datetime="2012-06-25T11:14:54.000Z">
         <Latitude>44.57</Latitude>
         <Longitude>-95.51</Longitude>
      </Location>
      <OperatingHours datetime="2012-03-01T17:42:37.000Z">
         <Hour>198.80</Hour>
      </OperatingHours>
   </Equip>
   <Equip>
      <EquipHead>
         <Name>Yuza</Name>
         <Model>L208</Model>
         <EquipmentID>4DW772GP</EquipmentID>
         <SerialNumber>4DW772GP</SerialNumber>
         <PIN>1DW772GPVJF</PIN>
      </EquipHead>
      <Location datetime="2018-05-30T19:22:46.000Z">
         <Latitude>47.518556</Latitude>
         <Longitude>-70.422444</Longitude>
      </Location>
      <IdleHours datetime="2018-05-30T19:02:46.000Z">
         <Hour>33.74</Hour>
      </IdleHours>
      <OperatingHours datetime="2018-05-30T19:22:48.000Z">
         <Hour>72.35</Hour>
      </OperatingHours>
      <Distance datetime="2018-05-30T19:02:46.000Z">
         <Odometer>kilometre</Odometer>
         <OdometerV>30.9</OdometerV>
      </Distance>
      <FuelUsed datetime="2018-05-30T19:02:46.000Z">
         <FuelUnits>litre</FuelUnits>
         <Consumed>395</Consumed>
      </FuelUsed>
   </Equip>
   <Equip>
      <EquipHead>
         <OEMName>CALL</OEMName>
         <Model>562A</Model>
         <EquipmentID>1W2772G</EquipmentID>
         <SerialNumber>1TT772GPTE</SerialNumber>
         <PIN>1MM772GPTE</PIN>
      </EquipHead>
      <Location datetime="2018-05-30T07:00:17.000Z">
         <Latitude>22.809278</Latitude>
         <Longitude>-45.316417</Longitude>
      </Location>
      <IdleHours datetime="2018-05-24T20:37:03.000Z">
         <Hour>457.10</Hour>
      </IdleHours>
      <OperatingHours datetime="2018-05-30T18:25:18.000Z">
         <Hour>26.35</Hour>
      </OperatingHours>
      <Distance datetime="2018-05-23T13:26:37.000Z">
         <Units>kilometre</Units>
         <OdometerV>5075.6997</OdometerV>
      </Distance>
      <FuelUsed datetime="2018-05-24T20:37:03.000Z">
         <FuelUnits>litre</FuelUnits>
         <FuelConsumed>2548</FuelConsumed>
      </FuelUsed>
   </Equip>
</Group>

1 个答案:

答案 0 :(得分:1)

考虑未记录的内部变量XML:::xmlAttrsToDataFramecbind,并记录在案,xmlToDataFrame

library(XML)

doc <- xmlParse('/path/to/input.xml')
namespaces <- c(n="http://standards.is.com/is/151/-1")

xmldataframe <- cbind(xmlToDataFrame(doc, nodes=getNodeSet(doc, "//n:OperatingHours", namespaces)),
                      XML:::xmlAttrsToDataFrame(getNodeSet(doc, "//n:OperatingHours", namespaces)))
xmldataframe 

#     Hour                 datetime
# 1 198.80 2012-03-01T17:42:37.000Z
# 2  72.35 2018-05-30T19:22:48.000Z
# 3  26.35 2018-05-30T18:25:18.000Z