R XML to Dataframe

时间:2018-02-28 09:49:55

标签: r xml dataframe

R的新手并寻求一些帮助。如何从R中的以下xml转换为数据框。数据框应包含3列各列id。

<?xml version="1.0" encoding="UTF-8" ?> 
- <results version="1" total-rows="3" current-page="1" current-page-start-row="1" current-page-end-row="25" execution-time="0.0781255">
- <columns>
  <column id="ReferenceNumber" data-type="ReferenceNumber">Reference Number</column> 
  <column id="AllocatedTo" data-type="Allocation">Allocated To</column> 
  <column id="Reason" data-type="Category">Category Code</column> 
  </columns>
- <rows>
- <row case-reference="0150967018">
  <data column-id="ReferenceNumber">0150967018</data> 
  <data column-id="AllocatedTo">Suresh</data> 
  <data column-id="Reason">Actioned incorrectly</data> 
  </row>
- <row case-reference="0150967118">
  <data column-id="ReferenceNumber">0150967118</data> 
  <data column-id="AllocatedTo">Suresh</data> 
  <data column-id="Reason">Actioned incorrectly</data> 
  </row>
- <row case-reference="0150967218">
  <data column-id="ReferenceNumber">0150967218</data> 
  <data column-id="AllocatedTo">Suresh</data> 
  <data column-id="Reason">Actioned incorrectly</data> 
  </row>
  </rows>
  </results>

2 个答案:

答案 0 :(得分:0)

希望这有帮助!

library(xml2)
library(dplyr)

#pass your xml string to xml_text
xml_doc <- read_xml(xml_text)

df <- xml_doc %>% 
  xml_find_all("//rows/row/data") %>% 
  xml_text %>%
  matrix(ncol=3, byrow=T) %>%
  as.data.frame(stringsAsFactors=FALSE)
colnames(df) <- xml_doc %>% 
  xml_find_all("//columns/column") %>%
  xml_text
df

输出是:

  Reference Number Allocated To        Category Code
1       0150967018       Suresh Actioned incorrectly
2       0150967118       Suresh Actioned incorrectly
3       0150967218       Suresh Actioned incorrectly

答案 1 :(得分:0)

  

XML到数据框为了有效处理大文件中的数据,我们将xml文件中的数据作为数据框读取。然后处理数据   数据分析框架。

# Load the packages required to read XML files.
library("XML")
library("methods")

# Convert the input xml file to a data frame.
xmldataframe <- xmlToDataFrame("input.xml")
print(xmldataframe)

当我们执行上面的代码时,它会产生以下结果-

   Reference Number Allocated To        Reason
    1       0150967018       Suresh Actioned incorrectly
    2       0150967118       Suresh Actioned incorrectly
    3       0150967218       Suresh Actioned incorrectly

由于现在可以将数据用作数据框,因此我们可以使用与数据框相关的功能来读取和操作文件。