不幸的是,我无法解析以下示例,并且无法在此处找到类似的解决方案。 例如:
<?xml version="1.0" encoding="UTF-8"?>
<FISC V="1">
<EJ ID="61017">
<DAT V="1" FN="0000000000" ZN="6101201227" TN="000000000000" T="0">
<C>
<P C="1" NM="Good1" PRC="2500" Q="2000" SM="5000" TX="1" N="1" />
<P C="4" NM="Good4" PRC="1000" Q="1000" SM="1000" TX="1" N="2" />
<M NM="CASH" SM="6000" T="0" N="3" />
<E CS="2" NO="4730" SM="6000" N="4">
<TX DTPR="0.00" TX="1" TXPR="0.00" TXSM="0" TXTY="0" />
<TX DTPR="0.00" TX="0" TXPR="0.00" TXSM="0" TXTY="0" />
</E>
</C>
<TS>20140601101226</TS>
</DAT>
<DAT V="1" FN="0000000000" ZN="6101201227" TN="000000000000" T="0">
<C>
<P C="7" NM="Good7" PRC="1200" Q="1000" SM="1200" TX="1" N="1" />
<M NM="CAH" SM="1200" T="0" N="2" />
<E CS="2" NO="4731" SM="1200" N="3">
<TX DTPR="0.00" TX="1" TXPR="0.00" TXSM="0" TXTY="0" />
<TX DTPR="0.00" TX="0" TXPR="0.00" TXSM="0" TXTY="0" />
</E>
</C>
<TS>20140601104322</TS>
</DAT>
</EJ>
</FISC>
我想将其削减如下:
NO NM
4730 Good1
4730 Good4
4731 Good7
否 - 来自DAT / C / E的属性
NM - 来自DAT / C / P的属性
我尝试了什么:
require(XML)
test <- xmlParse('data.xml', encoding = 'UTF-8')
NM <- getNodeSet(test, "/FISC/EJ//P")
NO <- getNodeSet(test, "/FISC/EJ//E[@NO]")
和
require(rvest)
d <- read_html('data.xml', encoding = 'UTF-8')
ids <- data.frame(id = d %>% html_nodes("e") %>% html_attr("no"),
name = d %>% html_nodes("p") %>% html_attr("nm"))
但是每个节点DAT都有一个或多个子节点P.这就是我无法将结果绑定在一起的原因。
非常感谢任何帮助,谢谢。
答案 0 :(得分:0)
library(xml2)
library(purrr)
library(tibble)
read_xml('<?xml version="1.0" encoding="UTF-8"?>
<FISC V="1">
<EJ ID="61017">
<DAT V="1" FN="0000000000" ZN="6101201227" TN="000000000000" T="0">
<C>
<P C="1" NM="Good1" PRC="2500" Q="2000" SM="5000" TX="1" N="1" />
<P C="4" NM="Good4" PRC="1000" Q="1000" SM="1000" TX="1" N="2" />
<M NM="CASH" SM="6000" T="0" N="3" />
<E CS="2" NO="4730" SM="6000" N="4">
<TX DTPR="0.00" TX="1" TXPR="0.00" TXSM="0" TXTY="0" />
<TX DTPR="0.00" TX="0" TXPR="0.00" TXSM="0" TXTY="0" />
</E>
</C>
<TS>20140601101226</TS>
</DAT>
<DAT V="1" FN="0000000000" ZN="6101201227" TN="000000000000" T="0">
<C>
<P C="7" NM="Good7" PRC="1200" Q="1000" SM="1200" TX="1" N="1" />
<M NM="CAH" SM="1200" T="0" N="2" />
<E CS="2" NO="4731" SM="1200" N="3">
<TX DTPR="0.00" TX="1" TXPR="0.00" TXSM="0" TXTY="0" />
<TX DTPR="0.00" TX="0" TXPR="0.00" TXSM="0" TXTY="0" />
</E>
</C>
<TS>20140601104322</TS>
</DAT>
</EJ>
</FISC>') -> doc
# target the "E" nodes and iterate over them
map_df(xml_find_all(doc, "//DAT/C/E"), function(x) {
# target the sibling nodes of the current "E" node
p <- xml_find_all(x, "../P")
# extract the attributes you want
no <- xml_attr(x, "NO")
nm <- xml_attr(p, "NM")
# make a data frame from them
# map_df() will bind them all together for you
data_frame(NO=no, NM=nm)
})
## Source: local data frame [3 x 2]
##
## NO NM
## (chr) (chr)
## 1 4730 Good1
## 2 4730 Good4
## 3 4731 Good7