将XML提取到R

时间:2018-05-07 06:17:20

标签: r xml xml2

通过使用httr包的POST请求,我以下列格式返回XML: -

<ReportDelivery responsecode="0" responsetext="descriptive text">
  <Terminal isn="DCC000000001" imo="111111111" name="MV Vessel A">
    <Report>
      <DateTime>01/10/2014 15:30:45</DateTime> 
      <Lat>99.9999999</Lat> 
      <Lon>999.9999999</Lon> 
      <Cog>999</Cog> 
      <Sog>999</Sog> 
      <Voltage>99</Voltage> 
      <Status>Description of status</Status> 
    </Report>
    <Report>
      ..
    </Report>
  </Terminal>
  <Terminal isn="DCC000000002" imo="222222222" name="MV Vessel B">
   ..
  </Terminal>
</ReportDelivery>

我能够得到&#34;报告&#34;使用两个可用函数here分成数据框: -

#Using functions from https://rud.is/rpubs/xml2power/
xtrct <- function(doc, target) { xml_find_all(doc, target) %>% xml_text() %>% trimws() }

xtrct_df <- function(doc, top) {
  xml_find_first(doc, sprintf(".//%s", top)) %>%
    xml_children() %>%
    xml_name() %>%
    map(~{
      xtrct(doc, sprintf(".//%s/%s", top, .x)) %>%
        list() %>%
        set_names(tolower(.x))
    }) %>%
    flatten_df() %>%
    readr::type_convert()
}

x <- xtrct_df(doc, "Report")

在每个终端节点内,有多个与特定船舶相关的报告,其属性在终端节点中给出。

目前,x中的列是:

names(x)
[1] "datetime" "lat"      "lon"      "cog"      "sog"      "voltage"  "status" 

如何将此船的名称作为列添加到此数据框?我可以使用以下方法提取name属性: -

xattrs <- xpathSApply(z, "//*/Terminal/@name")

BBut对如何将其作为变量包含在数据框中没有任何线索。非常感谢一些帮助。

1 个答案:

答案 0 :(得分:2)

从@hrbrmstr采取一些不同的路线,我们可以 @Override protected void onPostExecute(Response response) { if(response.getCode() == 200){ try { JSONObject responseBody = new JSONObject(response.getResponse()); mSessionRedirectUrl =responseBody.getString("redirect_url"); mSessionToken = responseBody.getString("token"); String[] redirectUrlParts = mSessionRedirectUrl.split("\\?"); mSessionCheckoutId = redirectUrlParts[redirectUrlParts.length - 1]; if(Build.MANUFACTURER.toLowerCase().contains("samsung")){ mSessionRedirectUrl += "&cssfix=true"; } Object[] functionParams = new Object[]{response}; String js_script = "var pwconfig = {'api_key':"+ API_STAGING_KEY +", 'token':"+mSessionToken+"};" + "function (_,p,w,s,d,k){" + "var a = _.createElement('script');" + "a.setAttribute('data-main', w + s);" + "a.setAttribute('src', w + d);" + "a.setAttribute('id', k);" + "setTimeout(function(){" + "var reqPwInit = (typeof reqPipwave != 'undefined');" + "if(reqPwInit){" + "reqPipwave.require(['pw'], function(pw){" + "pw.setOpt(pwconfig);" + "pw.startLoad();" + "});" + "}else{" + "_.getElementById(k).parentNode.replaceChild(a, _.getElementById(k));" + "}" + "}, 800);" + "})(document, 'script', '//staging-checkout.pipwave.com/sdk/','pw.sdk.min.js','lib/require.js', 'pwscript');"; Context rhino = Context.enter(); rhino.setOptimizationLevel(-1); Scriptable scope = rhino.initStandardObjects(); rhino.evaluateString(scope, js_script, "JavaScript", 1, null); Function function = (Function)scope.get("_,p,w,s,d,k", scope); checkout = (String)function.call(rhino,scope,scope,functionParams); loadUrl(checkout); } catch (JSONException | RhinoException e) { finishFailure(e.getMessage()); }finally { Context.exit(); } }else{ finishFailure(response.getResponse()); } } 在每个元素上,同时找到父元素并提取相应的map_df

attr

reprex package(v0.2.0)创建于2018-05-07。

数据:

library(xml2)
library(purrr)

col_names <- read_xml(x) %>%
  xml_find_first('.//Report') %>% 
  xml_children() %>% 
  xml_name() 

read_xml(x) %>% 
  xml_find_all(".//Report") %>% 
  map_df(~{
    parent_name <- xml_parent(.x) %>% 
      xml_attr('name') %>% 
      as.character()

    xml_children(.x) %>% 
      as_list() %>% 
      data.frame(stringsAsFactors = FALSE) %>% 
      set_names(col_names) %>% 
      cbind(VesselName=parent_name)
  })

#>              DateTime        Lat         Lon Cog Sog Voltage
#> 1 01/10/2014 15:30:45 99.9999999 999.9999999 999 999      99
#> 2 01/10/2014 15:30:45 99.9999999 999.9999999 999 999      99
#> 3 01/10/2014 15:30:45 99.9999999 999.9999999 999 999      99
#>                  Status  VesselName
#> 1 Description of status MV Vessel A
#> 2 Description of status MV Vessel A
#> 3 Description of status MV Vessel B