使用R修改XML节点

时间:2015-07-07 17:38:39

标签: xml r

我有一个输入XML文件(不是HTML),我想对标签进行更改。无论我在哪里找到" p"节点是" step"的子节点节点,我需要删除它,但内容应保留并分配给"步骤"。此外,输出应该是一个xml文件,我使用R。

<h2>
<h4>
<stepgrp type="ordered-legal">
<figgrp-inlist>
<step>
<graphic version="1" object-id="4188" />
<p>Install the clutch spring compressor.</p>
</step>
<stepgrp2 type="unordered-bullet">
<step>
<p>One piece case  use J414202
Disc.</p>
</step>
<step>
Two piece case  use J42628 Disc.
</step>
</stepgrp2>
</figgrp-inlist>
<figgrp-inlist>
<step>
<graphic version="1" object-id="59269" />
<p>Tighten the clutch spring compressor.</p>
</step>
<step>
Remove the low/reverse clutch retainer ring.
</step>
<step>
Remove the low/reverse the clutch spring assembly.
</step>
</figgrp-inlist>
<figgrp-inlist>
<step>
<graphic version="1" object-id="4190" />
<p>Blow compressed air into the case passage to remove the
low/reverse clutch piston.</p>
</step>
</figgrp-inlist>
</stepgrp>
</h4>
</h2>

我编写了一个for循环代码,用于标识&#34; p&#34;的行位置。和&#34;步骤&#34;节点,但我想让它动态,以便它识别&#34; p&#34;节点并在它是&#34;步骤&#34;的子节点时将其删除节点但内容应保留。
谢谢!

2 个答案:

答案 0 :(得分:1)

假设变量xml包含您的示例:

# xml <- '<h2>...'
library(XML)
doc <- xmlParse(xml, asText = TRUE)
invisible(removeNodes(doc['//step/p']))
saveXML(doc, file = tf <- tempfile(fileext = ".xml"))
# <?xml version="1.0"?>
# <h2>
#   <h4>
#     <stepgrp type="ordered-legal">
#       <figgrp-inlist>
#         <step>
#           <graphic version="1" object-id="4188"/>
#         </step>
#         <stepgrp2 type="unordered-bullet">
#           <step/>
#           <step>
# Two piece case  use J42628 Disc.
# </step>
#         </stepgrp2>
#       </figgrp-inlist>
#       <figgrp-inlist>
#         <step>
#           <graphic version="1" object-id="59269"/>
#         </step>
#         <step>
# Remove the low/reverse clutch retainer ring.
# </step>
#         <step>
# Remove the low/reverse the clutch spring assembly.
# </step>
#       </figgrp-inlist>
#       <figgrp-inlist>
#         <step>
#           <graphic version="1" object-id="4190"/>
#         </step>
#       </figgrp-inlist>
#     </stepgrp>
#   </h4>
# </h2>

输出存储在文件名中,该文件名位于tf(临时文件)中。

添加

关于您的评论,请尝试:

doc <- xmlParse(xml, asText = TRUE)
nodes <- doc['//step']
idx <- which(sapply(nodes, function(x) 'p' %in% names(xmlChildren(x))))
vals <- sapply(nodes[idx], xmlValue)
removeNodes(doc['//step/p'])
for (x in seq_len(length(vals)))
  newXMLTextNode(text = vals[x], doc['//step'][[idx[x]]])

但可能会有更优雅的版本。

答案 1 :(得分:0)

请找到我朋友提出的答案,确实有效!

t1 <- readLines('xml')

t2 <-paste(t1,collapse = "\n")
t3 <- regmatches(t2, regexpr('<step>.+</step>', t2))
t4 <- as.character(unlist(strsplit(as.character(t3),"\n")))

torf <- t1 %in% t4

t5 <- character(length(t1))

for(i in 1 :length(t1)){
  if(torf[i]){
    t5[i] <- t1[i]
  } else {
    t5[i] <- t5[i]
  }
}


removep <- function(x){
  x1 <- gsub("<p>","",x)
  x2 <- gsub("</p>","",x1)
  return (x2)
}

t5 <- removep(t5)

for(i in 1:length(t5)){
  if(t5[i]!=""){
    t5[i] <- t5[i]
  } else {
    t5[i] <- t1[i]
  }
}