解析具有相同名称的子节点的XML文件

时间:2019-03-17 21:40:22

标签: r xml

我有一个XML文件-下面是简短版本

<resultset>
  <row>
    <column name="indexpatient">2</column>
    <column name="height" null="true"></column>
    <column name="ParameterMeasure">Cardiac/MM/Dimension/LVIDd</column>
    <column name="ParameterId">MM/LVIDd</column>
    <column name="ResultIdentifier">Average</column>
    <column name="ResultValue">0.05617021151</column>
  </row>
  <row>
    <column name="indexpatient">2</column>
    <column name="height" null="true"></column>
    <column name="ParameterMeasure">Cardiac/MM/Dimension/LVIDd</column>
    <column name="ParameterId">MM/LVIDs</column>
    <column name="ResultIdentifier">Measurement No. 1</column>
    <column name="ResultValue">0.05341702</column>
  </row>
</resultset>

理想的输出是每个列名,例如indexpatient在数据帧中显示为列,而值显示为行。

有人可以帮我使用R做到这一点吗?

我被困住了,因为每个子节点都具有相同的名称,即“列名”。

1 个答案:

答案 0 :(得分:-1)

以下是基于以下问题/答案的解决方案:R XML - combining parent and child nodes(w same name) into data frame

library(xml2)
library(dplyr)
page<-read_xml('<resultset>
  <row>
         <column name="indexpatient">2</column>
         <column name="height" null="true"></column>
         <column name="ParameterMeasure">Cardiac/MM/Dimension/LVIDd</column>
         <column name="ParameterId">MM/LVIDd</column>
         <column name="ResultIdentifier">Average</column>
         <column name="ResultValue">0.05617021151</column>
         </row>
         <row>
         <column name="indexpatient">2</column>
         <column name="height" null="true"></column>
         <column name="ParameterMeasure">Cardiac/MM/Dimension/LVIDd</column>
         <column name="ParameterId">MM/LVIDs</column>
         <column name="ResultIdentifier">Measurement No. 1</column>
         <column name="ResultValue">0.05341702</column>
         </row>
         </resultset>')


rows<- page %>% xml_find_all('//row') 

dfs<-lapply(rows, function(node){
   #find the attr value from all child nodes
   names<-node %>% xml_children() %>% xml_attr("name")  
   #find all values
   values<-node %>% xml_children() %>% xml_text()

   #create data frame and properly label the columns
   df<-data.frame(t(values), stringsAsFactors = FALSE)
   names(df)<-names
   df
})

#bind together and add uid to final dataframe.
answer<-bind_rows(dfs)
answer

# indexpatient height           ParameterMeasure ParameterId  ResultIdentifier   ResultValue
# 1            2        Cardiac/MM/Dimension/LVIDd    MM/LVIDd           Average 0.05617021151
# 2            2        Cardiac/MM/Dimension/LVIDd    MM/LVIDs Measurement No. 1    0.05341702
>