Question

我使用定期更新的XML报告，我希望使用R＆amp; amp; XML2。

Here's a link to an entire example file. 以下是XML的示例：

<?xml version="1.0" ?>
<riDetailEnrolleeReport xmlns="http://vo.edge.fm.cms.hhs.gov">
    <includedFileHeader>
        <outboundFileIdentifier>f2e55625-e70e-4f9d-8278-fc5de7c04d47</outboundFileIdentifier>
        <cmsBatchIdentifier>RIP-2015-00096</cmsBatchIdentifier>
        <cmsJobIdentifier>16220</cmsJobIdentifier>
        <snapShotFileName>25032.BACKUP.D03152016T032051.dat</snapShotFileName>
        <snapShotFileHash>20d887c9a71fa920dbb91edc3d171eb64a784dd6</snapShotFileHash>
        <outboundFileGenerationDateTime>2016-03-15T15:20:54</outboundFileGenerationDateTime>
        <interfaceControlReleaseNumber>04.03.01</interfaceControlReleaseNumber>
        <edgeServerVersion>EDGEServer_14.09_01_b0186</edgeServerVersion>
        <edgeServerProcessIdentifier>8</edgeServerProcessIdentifier>
        <outboundFileTypeCode>RIDE</outboundFileTypeCode>
        <edgeServerIdentifier>2800273</edgeServerIdentifier>
        <issuerIdentifier>25032</issuerIdentifier>
    </includedFileHeader>
    <calendarYear>2015</calendarYear>
    <executionType>P</executionType>
    <includedInsuredMemberIdentifier>
        <insuredMemberIdentifier>ARS001</insuredMemberIdentifier>
        <memberMonths>12.13</memberMonths>
        <totalAllowedClaims>1000.00</totalAllowedClaims>
        <totalPaidClaims>100.00</totalPaidClaims>
        <moopAdjustedPaidClaims>100.00</moopAdjustedPaidClaims>
        <cSRMOOPAdjustment>0.00</cSRMOOPAdjustment>
        <estimatedRIPayment>0.00</estimatedRIPayment>
        <coinsurancePercentPayments>0.00</coinsurancePercentPayments>
        <includedPlanIdentifier>
            <planIdentifier>25032VA013000101</planIdentifier>
            <includedClaimIdentifier>
                <claimIdentifier>CADULT4SM00101</claimIdentifier>
                <claimPaidAmount>100.00</claimPaidAmount>
                <crossYearClaimIndicator>N</crossYearClaimIndicator>
            </includedClaimIdentifier>
        </includedPlanIdentifier>
    </includedInsuredMemberIdentifier>
    <includedInsuredMemberIdentifier>
        <insuredMemberIdentifier>ARS002</insuredMemberIdentifier>
        <memberMonths>9.17</memberMonths>
        <totalAllowedClaims>0.00</totalAllowedClaims>
        <totalPaidClaims>0.00</totalPaidClaims>
        <moopAdjustedPaidClaims>0.00</moopAdjustedPaidClaims>
        <cSRMOOPAdjustment>0.00</cSRMOOPAdjustment>
        <estimatedRIPayment>0.00</estimatedRIPayment>
        <coinsurancePercentPayments>0.00</coinsurancePercentPayments>
        <includedPlanIdentifier>
            <planIdentifier>25032VA013000101</planIdentifier>
            <includedClaimIdentifier>
                <claimIdentifier></claimIdentifier>
                <claimPaidAmount>0</claimPaidAmount>
                <crossYearClaimIndicator>N</crossYearClaimIndicator>
            </includedClaimIdentifier>
        </includedPlanIdentifier>
    </includedInsuredMemberIdentifier>
</riDetailEnrolleeReport>

我想：

将XML读入R
找到特定的insuredMemberIdentifier
在（2）
在data.frame中存储insuredMemberIdentifier，planIdentifier，claimIdentifier和claimPaidAmount的所有文本和值，每个唯一声明ID都有一行（声明ID的成员ID为1对多）

到目前为止，我已经完成了1并且我在2：

## Step 1 ##
ride <- read_xml("/Users/temp/Desktop/RIDetailEnrolleeReport.xml")

## Step 2 -- assume the insuredMemberIdentifier of interest is 'ARS001' ##
memID <- xml_find_all(ride, "//d1:insuredMemberIdentifier[text()='ARS001']", xml_ns(ride))

[我知道我可以使用xml_text()来提取元素的文本。]

在上面的步骤2中的代码之后，我尝试使用xml_parent()找到insuredMemberIdentifier的父节点，将其保存为变量，然后在该保存的变量节点上重复步骤2以获取声明信息。

node <- xml_parent(memID)
xml_find_all(node, "//d1:claimIdentifier", xml_ns(ride))

但这只会导致在全局文件中提取所有claimIdentifier。

有关如何进入上述第4步的任何帮助/信息将不胜感激。提前谢谢。

Answer 1

为延迟响应而道歉，但对于后代，请使用 xml2 如上所述导入数据，然后按照har07的提示通过ID解析xml文件。

# output object to collect all claims
res <- data.frame(
    insuredMemberIdentifier = rep(NA, 1), 
    planIdentifier = NA, 
    claimIdentifier = NA, 
    claimPaidAmount = NA)
# vector of ids of interest
ids <- c('ARS001')
# indexing counter
starti <- 1
# loop through all ids
for (ii in seq_along(ids)) {
    # find ii-th id
    ## Step 2 -- assume the insuredMemberIdentifier of interest is 'ARS001' ##
    memID <- xml_find_all(x = ride, 
        xpath = paste0("//d1:insuredMemberIdentifier[text()='", ids[ii], "']"))
    # find node for 
    node <- xml_parent(memID)
    # as har07's comment find claim id within this node
    cid <- xml_find_all(node, ".//d1:claimIdentifier", xml_ns(ride))
    pid <- xml_find_all(node, ".//d1:planIdentifier", xml_ns(ride))
    cpa <- xml_find_all(node, ".//d1:claimPaidAmount", xml_ns(ride))
    # add invalid data handling if necessary
    if (length(cid) != length(cpa)) {
        warning(paste("cid and cpa do not match for", ids[ii]))
        next
    }
    # collect outputs 
    res[seq_along(cid) + starti - 1, ] <- list(
        ids[ii], 
        xml_text(pid),
        xml_text(cid),
        xml_text(cpa))
    # adjust counter to add next id into correct row
    starti <- starti + length(cid)
}
res
#   insuredMemberIdentifier   planIdentifier claimIdentifier claimPaidAmount
# 1                  ARS001 25032VA013000101  CADULT4SM00101          100.00

R＆amp; xml2：按特定文本值定位元素，将所有子值存储在data.frame

1 个答案: