如何将其解析为 R 中的数据帧? xml/html

时间:2021-02-26 14:49:31

标签: html r xml api parsing

我是 R 新手,对 XML/HTML 了解不多。我有这个 txt 文件,我正在尝试解析它并将其转换为数据框。下面是文本文件外观的示例。它包含无效字符,例如:

  1. '
  2. "
  3. 
componentDidUpdate(currentProps) {
    const {
      incidentDetails: { ValidStatusChanges },
      assignedIncidents,
    } = currentProps.state.data;

    if (!ValidStatusChanges) {
      this.props.valueChange({
        storeName: "data",
        prop: "incidentDetails",
        nestedProp: ValidStatusChanges,
        value: []
        })
    }

    if (ValidStatusChanges.length === 0 && assignedIncidents.length !== 0) {
      this.props.valueChange({
        storeName: "data",
        prop: "assignedIncidents",
        value: [],
      });
    }
  }

这是我使用 gsub 删除这些字符后的示例代码,但我仍然遇到错误

"x"
"1" "<?xml version=\"1.0\" ?><response status='ok'><serviceRequestList><serviceRequest><accountManagerId>11111</accountManagerId><billable>0</billable><billableTotal>0.0000000000</billableTotal><billingStatus>Not Billed</billingStatus><costTotal>0.0000</costTotal><customerContactEmail>example@imperial.nhs.uk</customerContactEmail><customerContactId>2222222</customerContactId><customerContactName>Example Example</customerContactName><customerContactPhone>0044 (0)000 111 2222</customerContactPhone><customerContactPhoneMobile></customerContactPhoneMobile><customerId>444444</customerId><customerLocationCity>London</customerLocationCity><customerLocationCountry>United Kingdom</customerLocationCountry><customerLocationId>9999999</customerLocationId><customerLocationName>Example's Hospital</customerLocationName><customerLocationNotes></customerLocationNotes><customerLocationPostalCode>W2 1NY</customerLocationPostalCode><customerLocationState>Greater London</customerLocationState><customerLocationStreetAddress>Example Street</customerLocationStreetAddress><customerLocationZone></customerLocationZone><customerName>Example  Healthcare (EXAMPLE)</customerName><dateTimeCreated>2010-04-06T09:47:25</dateTimeCreated><dateTimeClosed>2011-05-24T07:32:05.240</dateTimeClosed><description>Example - Ex/CANCELED</description><detailedDescription></detailedDescription><priority>3</priority><priorityLabel>Medium</priorityLabel><serviceManagerId>0</serviceManagerId><serviceRequestId>1007</serviceRequestId><status>Closed</status><timeOpen_hours>9909.7500000</timeOpen_hours><type></type></serviceRequest><serviceRequest><accountManagerId>11111</accountManagerId><billable>0</billable><billableTotal>0.0000000000</billableTotal><billingStatus>Not Billed</billingStatus><costTotal>0.0000</costTotal><customerContactEmail>example.example@gstt.nhs.uk, example2.example2@gstt.nhs.uk</customerContactEmail><customerContactId>5555555</customerContactId><customerContactName>Ex Example</customerContactName><customerContactPhone>88888 444444</customerContactPhone><customerContactPhoneMobile>07817 738912</customerContactPhoneMobile><customerId>957056</customerId><customerLocationCity>London</customerLocationCity><customerLocationCountry>United Kingdom</customerLocationCountry><customerLocationId>1372407</customerLocationId><customerLocationName>Example' Hospital</customerLocationName><customerLocationNotes></customerLocationNotes><customerLocationPostalCode>GH1 7EH</customerLocationPostalCode><customerLocationState>Greater London</customerLocationState><customerLocationStreetAddress>Example Bridge Road</customerLocationStreetAddress><customerLocationZone></customerLocationZone><customerName>Example' Trust (EXTT)</customerName><dateTimeCreated>2010-06-10T07:37:58</dateTimeCreated><dateTimeClosed>2010-06-10T07:42:40</dateTimeClosed><description>Software -Example - 55555</description><detailedDescription>The example that I have created.&#x0D;
This is an example, I made up the data.&#x0D;
This is another line for the example. &#x0D;
</detailedDescription><priority>3</priority><priorityLabel>Medium</priorityLabel><serviceManagerId>0</serviceManagerId><serviceRequestId>6007</serviceRequestId><status>Closed</status><timeOpen_hours>0.0833000</timeOpen_hours><type>Problem</type></serviceRequest></serviceRequestList></response>"
library(httr)
library(xml2)
library(dplyr)
library(XML)
library(plyr)

#Change working directory to be able to save on network share drive
setwd("\\\\mwo-file\\Example")

#Parsing the clean XML File
data <- xmlTreeParse("Sample.txt")

即使当我编辑文本文件时 Error: 1: Start tag expected, '<' not found 是第一个,它仍然不起作用。当我尝试时:

<

它可以工作,但对象看起来很混乱,我不知道如何检索我想要的节点的信息。

#Parsing the clean XML File
data <- htmlTreeParse("Sample.txt")
data

有人知道解析该示例文件并从中生成数据框的最佳方法吗?

0 个答案:

没有答案