我有一个输入XML文件(不是HTML),我想对标签进行更改。无论我在哪里找到" p"节点是" step"的子节点节点,我需要删除它,但内容应保留并分配给"步骤"。此外,输出应该是一个xml文件,我使用R。
<h2>
<h4>
<stepgrp type="ordered-legal">
<figgrp-inlist>
<step>
<graphic version="1" object-id="4188" />
<p>Install the clutch spring compressor.</p>
</step>
<stepgrp2 type="unordered-bullet">
<step>
<p>One piece case use J414202
Disc.</p>
</step>
<step>
Two piece case use J42628 Disc.
</step>
</stepgrp2>
</figgrp-inlist>
<figgrp-inlist>
<step>
<graphic version="1" object-id="59269" />
<p>Tighten the clutch spring compressor.</p>
</step>
<step>
Remove the low/reverse clutch retainer ring.
</step>
<step>
Remove the low/reverse the clutch spring assembly.
</step>
</figgrp-inlist>
<figgrp-inlist>
<step>
<graphic version="1" object-id="4190" />
<p>Blow compressed air into the case passage to remove the
low/reverse clutch piston.</p>
</step>
</figgrp-inlist>
</stepgrp>
</h4>
</h2>
我编写了一个for循环代码,用于标识&#34; p&#34;的行位置。和&#34;步骤&#34;节点,但我想让它动态,以便它识别&#34; p&#34;节点并在它是&#34;步骤&#34;的子节点时将其删除节点但内容应保留。
谢谢!
答案 0 :(得分:1)
假设变量xml
包含您的示例:
# xml <- '<h2>...'
library(XML)
doc <- xmlParse(xml, asText = TRUE)
invisible(removeNodes(doc['//step/p']))
saveXML(doc, file = tf <- tempfile(fileext = ".xml"))
# <?xml version="1.0"?>
# <h2>
# <h4>
# <stepgrp type="ordered-legal">
# <figgrp-inlist>
# <step>
# <graphic version="1" object-id="4188"/>
# </step>
# <stepgrp2 type="unordered-bullet">
# <step/>
# <step>
# Two piece case use J42628 Disc.
# </step>
# </stepgrp2>
# </figgrp-inlist>
# <figgrp-inlist>
# <step>
# <graphic version="1" object-id="59269"/>
# </step>
# <step>
# Remove the low/reverse clutch retainer ring.
# </step>
# <step>
# Remove the low/reverse the clutch spring assembly.
# </step>
# </figgrp-inlist>
# <figgrp-inlist>
# <step>
# <graphic version="1" object-id="4190"/>
# </step>
# </figgrp-inlist>
# </stepgrp>
# </h4>
# </h2>
输出存储在文件名中,该文件名位于tf
(临时文件)中。
关于您的评论,请尝试:
doc <- xmlParse(xml, asText = TRUE)
nodes <- doc['//step']
idx <- which(sapply(nodes, function(x) 'p' %in% names(xmlChildren(x))))
vals <- sapply(nodes[idx], xmlValue)
removeNodes(doc['//step/p'])
for (x in seq_len(length(vals)))
newXMLTextNode(text = vals[x], doc['//step'][[idx[x]]])
但可能会有更优雅的版本。
答案 1 :(得分:0)
请找到我朋友提出的答案,确实有效!
t1 <- readLines('xml')
t2 <-paste(t1,collapse = "\n")
t3 <- regmatches(t2, regexpr('<step>.+</step>', t2))
t4 <- as.character(unlist(strsplit(as.character(t3),"\n")))
torf <- t1 %in% t4
t5 <- character(length(t1))
for(i in 1 :length(t1)){
if(torf[i]){
t5[i] <- t1[i]
} else {
t5[i] <- t5[i]
}
}
removep <- function(x){
x1 <- gsub("<p>","",x)
x2 <- gsub("</p>","",x1)
return (x2)
}
t5 <- removep(t5)
for(i in 1:length(t5)){
if(t5[i]!=""){
t5[i] <- t5[i]
} else {
t5[i] <- t1[i]
}
}