我正在尝试通过从XML文件中提取中断数据并将每个中断与特定的仪表相关联来构建数据框。数据的简化示例如下:
<MeterReadings Irn="311" Source="Remote">
<Meter MeterIrn="311" IsActive="true" />
<ConsumptionData>
</ConsumptionData>
<IntervalData>
<Reading TimeStamp="2016-10-13" />
</IntervalData>
<EventData>
<EventSpec Type="Outage Detected from Interval Data" Category="Full Power Outage / Restoration" />
<Event TimeStamp="2014-10-31 14:17:40" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
</Event>
<Event TimeStamp="2014-10-31 14:16:20" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
</Event>
<Event TimeStamp="2014-10-31 14:16:16" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
</Event>
<Event TimeStamp="2014-10-31 14:15:12" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
</Event>
<Event TimeStamp="2014-10-31 14:12:00" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data">
</Event>
</EventData>
</MeterReadings>
我想要的是设置一个数据框,其中包括第一列中的仪表编号和第二列中每次中断的时间。
我尝试使用以下表达式:
outage.inv <- data.frame(xpathSApply(doc, '//Event[contains(@EventInfo, "Outage detected from Interval Data")]/ancestor::MeterReadings', xmlGetAttr, "Irn"))
outage.df <- data.frame(xpathSApply(doc, '//MeterReadings/EventData/EventSpec[@Type="Outage Detected from Interval Data"]/following-sibling::Event', xmlGetAttr, "TimeStamp"))
outage.inv <- cbind(outage.inv, outage.df)
但是第一个表达式只拉取仪表编号一次,因此变量总数不匹配。在这种情况下1米数和5个停电时间。有没有办法使用后代为每次出现的属性拉出祖先属性?
我检查了以下答案,但未能弄清楚。
XPath to select element based on childs child value
R: How to get parent attributes and node values at the site time?
非常感谢任何帮助。
答案 0 :(得分:0)
另一种方法。
这是数据:
txt <- ' <MeterReadings Irn="311" Source="Remote">
<Meter MeterIrn="311" IsActive="true" />
<ConsumptionData>
</ConsumptionData>
<IntervalData>
<Reading TimeStamp="2016-10-13" />
</IntervalData>
<EventData>
<EventSpec Type="Outage Detected from Interval Data" Category="Full Power Outage / Restoration" />
<Event TimeStamp="2014-10-31 14:17:40" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
</Event>
<Event TimeStamp="2014-10-31 14:16:20" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
</Event>
<Event TimeStamp="2014-10-31 14:16:16" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
</Event>
<Event TimeStamp="2014-10-31 14:15:12" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
</Event>
<Event TimeStamp="2014-10-31 14:12:00" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data">
</Event>
</EventData>
</MeterReadings>'
我们可以用不同的方式处理记录:
library(xml2)
library(purrr)
library(dplyr)
doc <- read_xml(txt)
xml_find_all(doc, "//MeterReadings") %>%
map_df(function(x) {
meter <- xml_attr(x, "Irn")
xml_find_all(x, "//Event[contains(@EventInfo, 'Outage')]") %>%
map_df(function(y) {
data_frame(
meter=meter,
timestamp=xml_attr(y, "TimeStamp"),
discovered_at=xml_attr(y, "DiscoveredAt")
)
})
})
生成:
## # A tibble: 5 × 3
## meter timestamp discovered_at
## <chr> <chr> <chr>
## 1 311 2014-10-31 14:17:40 2014-11-01 12:05:28
## 2 311 2014-10-31 14:16:20 2014-11-01 12:05:28
## 3 311 2014-10-31 14:16:16 2014-11-01 12:05:28
## 4 311 2014-10-31 14:15:12 2014-11-01 12:05:28
## 5 311 2014-10-31 14:12:00 2014-11-01 12:05:28
答案 1 :(得分:0)
修改了过滤米和时间戳的答案,以便它不会重复所有米的所有时间戳:
outage.df <- xml_find_all(doc, "//MeterReadings[EventData/Event[contains(@EventInfo, 'Outage')]]") %>%
map_df(function(x) {
meter <- xml_attr(x, "Irn")
xml_find_all(x, paste("//MeterReadings[@Irn=",meter,"]/EventData/Event[contains(@EventInfo, 'Outage')]")) %>%
map_df(function(y) {
data_frame(
meter=meter,
timestamp=xml_attr(y, "TimeStamp"),
discovered_at=xml_attr(y, "DiscoveredAt")
)
})
})