在xml节点R上提取正确的值

时间:2018-11-27 20:32:52

标签: r xml

我具有以下xml结构

 <Resultados>
    <Zona_Carga>
      <zona_carga>ACAPULCO</zona_carga>
      <Valores>
        <Valor>
          <fecha>2017-04-19</fecha>
          <hora>1</hora>
          <pz>1063.91</pz>
          <pz_ene>968.54</pz_ene>
          <pz_per>95.37</pz_per>
          <pz_cng>0</pz_cng>
        </Valor>
        <Valor>
          <fecha>2017-04-19</fecha>
          <hora>2</hora>
          <pz>1000.3</pz>
          <pz_ene>909.27</pz_ene>
          <pz_per>91.03</pz_per>
          <pz_cng>0</pz_cng>
        </Valor> ...

在这里我有许多不同的“ zona_carga”值。我想制作一个像这样的数据框:

        fecha hora      pz  pz_ene pz_per pz_cng
1  2017-04-19    1 1063.91  968.54  95.37      0
2  2017-04-19    2  1000.3  909.27  91.03      0
3  2017-04-19    3  954.25  868.84  85.41      0
4  2017-04-19    4  939.86  855.47   84.4      0
5  2017-04-19    5  931.78  847.65  84.14      0
6  2017-04-19    6  955.97  870.63  85.34      0
7  2017-04-19    7 1025.11  943.27  81.83      0
8  2017-04-19    8 1111.06 1030.09  82.52  -1.55
9  2017-04-19    9 1422.03 1310.35 111.89  -0.21
10 2017-04-19   10 1497.15 1388.88 108.27      0
11 2017-04-19   11 1482.88 1381.24 101.65      0
12 2017-04-19   12 1483.35 1405.45   77.9      0
13 2017-04-19   13 1479.93 1405.95  73.98      0
14 2017-04-19   14 1427.65 1346.85   80.8      0

带有一个附加列,显示每个值对应于哪个zona_carga。这是我正在使用的代码:

url = getURL("https://ws01.cenace.gob.mx:8082/SWPEND/SIM/SIN/MDA/ACAPULCO,AGUASCALIENTES/2017/04/19/2017/04/19/XML")
xmlfile <- xmlParse(url)
a <- xmlToDataFrame(getNodeSet(xmlfile,"//*/Valor"), collectNames=T)

有什么建议吗? 谢谢!

1 个答案:

答案 0 :(得分:1)

this.state

哪个会产生:

library(xml2)
library(httr)
library(tidyverse)

# Cross-platform, caching-download (be nice and don't waste bandwidth that isn't yours)
httr::GET(
  url = "https://ws01.cenace.gob.mx:8082/SWPEND/SIM/SIN/MDA/ACAPULCO,AGUASCALIENTES/2017/04/19/2017/04/19/XML",
  httr::write_disk("~/Data/ws01.xml"),
  httr::progress()
) -> res

doc <- read_xml("~/Data/ws01.xml")

# this has namesapces so it's a pain and we'll make it painless
doc <- xml_ns_strip(doc)

xml_find_all(doc, ".//Resultados/Zona_Carga") %>%  # get main sections
  map_df(~{
    zone <- xml_find_all(.x, ".//zona_carga") %>% xml_text() # pull out the zone
    xml_find_all(.x, ".//Valores/Valor") %>% # get all the sub-records
      map_df(~{
        ch <- xml_children(.x) # get all the kids
        as.list(set_names( # mash them together
          xml_text(ch),
          xml_name(ch)
        ))
      }) %>%
      mutate(zona_carga = zone) # add in the zone
  }) %>%
  readr::type_convert() # get better column types