将XML文件解析为R中的数据框时遇到一些麻烦。
我有一些XML代码
<?xml version="1.0" encoding="windows-1251"?>
<dlc ac="ED29099541DB7B022D00E4179F00" softversion="0.2">
<statistics enterprise="Организация">
<shop Id="4" GUID="{F5D518E4-3C80-44E9-835B-D87CC35A7BDB}"
worktimefrom="2015-04-03 08:00:00" worktimeto="2015-04-03 20:00:00"
name="Объект" clientId="Client 1">
<sensor GUID="{63017726-D121-4EB3-A684-BC3D27AED119}" GCGUID="00000000-
0000-0000-0000-000000000000" Id="25" type="1" minortype="1" address="01"
name="Устройство" balance="0" devtype="1">
<stat datetime="2017-01-20 09:37:00" realin="1" realout="2" />
<stat datetime="2017-01-20 09:38:00" realin="1" realout="2" />
<stat datetime="2017-01-20 09:39:00" realin="1" realout="0" />
<stat datetime="2017-01-20 09:40:00" realin="0" realout="1" />
<stat datetime="2017-01-20 09:41:00" realin="1" realout="0" />
<stat datetime="2017-01-20 09:42:00" realin="1" realout="0" />
<stat datetime="2017-01-20 09:43:00" realin="1" realout="1" />
<stat datetime="2017-01-20 09:44:00" realin="0" realout="1" />
<stat datetime="2017-01-20 09:52:00" realin="1" realout="0" />
<stat datetime="2017-01-20 09:53:00" realin="0" realout="1" />
<stat datetime="2017-01-20 09:56:00" realin="1" realout="0" />
<stat datetime="2017-01-20 09:57:00" realin="0" realout="1" />
<stat datetime="2017-01-20 10:08:00" realin="0" realout="1" />
<stat datetime="2017-01-20 10:16:00" realin="0" realout="1" />
</sensor>
</shop>
我需要将其解析为R中的数据帧,我该怎么做?
答案 0 :(得分:0)
目前尚不清楚您对数据框的确切要求,但这是我的解决方案:
首先,数据:
file <- '
<?xml version="1.0" encoding="windows-1251"?>
<dlc ac="ED29099541DB7B022D00E4179F00" softversion="0.2">
<statistics enterprise="Организация">
<shop Id="4" GUID="{F5D518E4-3C80-44E9-835B-D87CC35A7BDB}"
worktimefrom="2015-04-03 08:00:00" worktimeto="2015-04-03 20:00:00"
name="Объект" clientId="Client 1">
<sensor GUID="{63017726-D121-4EB3-A684-BC3D27AED119}" GCGUID="00000000-
0000-0000-0000-000000000000" Id="25" type="1" minortype="1" address="01"
name="Устройство" balance="0" devtype="1">
<stat datetime="2017-01-20 09:37:00" realin="1" realout="2" />
<stat datetime="2017-01-20 09:38:00" realin="1" realout="2" />
<stat datetime="2017-01-20 09:39:00" realin="1" realout="0" />
<stat datetime="2017-01-20 09:40:00" realin="0" realout="1" />
<stat datetime="2017-01-20 09:41:00" realin="1" realout="0" />
<stat datetime="2017-01-20 09:42:00" realin="1" realout="0" />
<stat datetime="2017-01-20 09:43:00" realin="1" realout="1" />
<stat datetime="2017-01-20 09:44:00" realin="0" realout="1" />
<stat datetime="2017-01-20 09:52:00" realin="1" realout="0" />
<stat datetime="2017-01-20 09:53:00" realin="0" realout="1" />
<stat datetime="2017-01-20 09:56:00" realin="1" realout="0" />
<stat datetime="2017-01-20 09:57:00" realin="0" realout="1" />
<stat datetime="2017-01-20 10:08:00" realin="0" realout="1" />
<stat datetime="2017-01-20 10:16:00" realin="0" realout="1" />
</sensor>
</shop>'
现在,我们使用rvest从每个stat
行中提取元素并将它们放在数据框中:
library(rvest)
lines <- read_html(file) %>% html_nodes('stat')
time <- lines %>% html_attr('datetime')
realin <- lines %>% html_attr('realin')
realout <- lines %>% html_attr('realout')
df <- data.frame(time, realin, realout, stringsAsFactors = F)
结果是:
> df
## time realin realout
## 1 2017-01-20 09:37:00 1 2
## 2 2017-01-20 09:38:00 1 2
## 3 2017-01-20 09:39:00 1 0
## 4 2017-01-20 09:40:00 0 1
## 5 2017-01-20 09:41:00 1 0
## 6 2017-01-20 09:42:00 1 0
## 7 2017-01-20 09:43:00 1 1
## 8 2017-01-20 09:44:00 0 1
## 9 2017-01-20 09:52:00 1 0
## 10 2017-01-20 09:53:00 0 1
## 11 2017-01-20 09:56:00 1 0
## 12 2017-01-20 09:57:00 0 1