R:如何将XML解析为dataftame?

时间:2017-08-11 17:17:09

标签: r xml xml-parsing

将XML文件解析为R中的数据框时遇到一些麻烦。

我有一些XML代码

<?xml version="1.0" encoding="windows-1251"?>
<dlc ac="ED29099541DB7B022D00E4179F00" softversion="0.2">
  <statistics enterprise="Организация">
  <shop Id="4" GUID="{F5D518E4-3C80-44E9-835B-D87CC35A7BDB}" 
worktimefrom="2015-04-03 08:00:00" worktimeto="2015-04-03 20:00:00" 
name="Объект" clientId="Client 1">
  <sensor GUID="{63017726-D121-4EB3-A684-BC3D27AED119}" GCGUID="00000000-
 0000-0000-0000-000000000000" Id="25" type="1" minortype="1" address="01" 
 name="Устройство" balance="0" devtype="1">
    <stat datetime="2017-01-20 09:37:00" realin="1" realout="2" />
    <stat datetime="2017-01-20 09:38:00" realin="1" realout="2" />
    <stat datetime="2017-01-20 09:39:00" realin="1" realout="0" />
    <stat datetime="2017-01-20 09:40:00" realin="0" realout="1" />
    <stat datetime="2017-01-20 09:41:00" realin="1" realout="0" />
    <stat datetime="2017-01-20 09:42:00" realin="1" realout="0" />
    <stat datetime="2017-01-20 09:43:00" realin="1" realout="1" />
    <stat datetime="2017-01-20 09:44:00" realin="0" realout="1" />
    <stat datetime="2017-01-20 09:52:00" realin="1" realout="0" />
    <stat datetime="2017-01-20 09:53:00" realin="0" realout="1" />
    <stat datetime="2017-01-20 09:56:00" realin="1" realout="0" />
    <stat datetime="2017-01-20 09:57:00" realin="0" realout="1" />
    <stat datetime="2017-01-20 10:08:00" realin="0" realout="1" />
    <stat datetime="2017-01-20 10:16:00" realin="0" realout="1" />
  </sensor>
</shop>

   

我需要将其解析为R中的数据帧,我该怎么做?

1 个答案:

答案 0 :(得分:0)

目前尚不清楚您对数据框的确切要求,但这是我的解决方案:

首先,数据:

file <- '
<?xml version="1.0" encoding="windows-1251"?>
 <dlc ac="ED29099541DB7B022D00E4179F00" softversion="0.2">
<statistics enterprise="Организация">
<shop Id="4" GUID="{F5D518E4-3C80-44E9-835B-D87CC35A7BDB}" 
worktimefrom="2015-04-03 08:00:00" worktimeto="2015-04-03 20:00:00" 
name="Объект" clientId="Client 1">
  <sensor GUID="{63017726-D121-4EB3-A684-BC3D27AED119}" GCGUID="00000000-
  0000-0000-0000-000000000000" Id="25" type="1" minortype="1" address="01" 
 name="Устройство" balance="0" devtype="1">
    <stat datetime="2017-01-20 09:37:00" realin="1" realout="2" />
    <stat datetime="2017-01-20 09:38:00" realin="1" realout="2" />
    <stat datetime="2017-01-20 09:39:00" realin="1" realout="0" />
    <stat datetime="2017-01-20 09:40:00" realin="0" realout="1" />
   <stat datetime="2017-01-20 09:41:00" realin="1" realout="0" />
   <stat datetime="2017-01-20 09:42:00" realin="1" realout="0" />
    <stat datetime="2017-01-20 09:43:00" realin="1" realout="1" />
    <stat datetime="2017-01-20 09:44:00" realin="0" realout="1" />
    <stat datetime="2017-01-20 09:52:00" realin="1" realout="0" />
    <stat datetime="2017-01-20 09:53:00" realin="0" realout="1" />
    <stat datetime="2017-01-20 09:56:00" realin="1" realout="0" />
    <stat datetime="2017-01-20 09:57:00" realin="0" realout="1" />
    <stat datetime="2017-01-20 10:08:00" realin="0" realout="1" />
    <stat datetime="2017-01-20 10:16:00" realin="0" realout="1" />
  </sensor>
</shop>'

现在,我们使用rvest从每个stat行中提取元素并将它们放在数据框中:

library(rvest)
lines <- read_html(file) %>% html_nodes('stat')

time <- lines %>% html_attr('datetime')
realin <- lines %>% html_attr('realin')
realout <- lines %>% html_attr('realout')

df <- data.frame(time, realin, realout, stringsAsFactors = F)

结果是:

> df

##                   time realin realout
## 1  2017-01-20 09:37:00      1       2
## 2  2017-01-20 09:38:00      1       2
## 3  2017-01-20 09:39:00      1       0
## 4  2017-01-20 09:40:00      0       1
## 5  2017-01-20 09:41:00      1       0
## 6  2017-01-20 09:42:00      1       0
## 7  2017-01-20 09:43:00      1       1
## 8  2017-01-20 09:44:00      0       1
## 9  2017-01-20 09:52:00      1       0
## 10 2017-01-20 09:53:00      0       1
## 11 2017-01-20 09:56:00      1       0
## 12 2017-01-20 09:57:00      0       1