我正在尝试删除不包含任何文字的p
代码。如果任何p
标记包含文本,但不包含任何父标记,所以我尝试创建父标记DIV 。我在尝试着
从 org.jsoup.nodes.Document 转换为 org.w3c.dom.Document。
是否有可能或任何捷径解决方案?
Java代码:
private void modifyMediaVariantContent(String html) {
org.jsoup.nodes.Document doc = Jsoup.parse(html);
for (org.jsoup.nodes.Element element : doc.select("*")) {
if (!element.hasText() && element.isBlock()) {
element.remove();
}
}
}
HTML字符串值:
在:
<p id="Id44">see the image and see the color... ?</p>
<p id="Id40"></p>
<div id="Id87" style="display:inline-block">
<video id="Id30" src="http://Id3.qa.cete.us/117973/video.mp4"></video>
</div>
<p id="Id28"></p>
<p id="Id-1"></p>
<div id ="Id21">
<img id="img_44186" src="/129884/apple.jpg" />
</div>
<p id="Id-320046-3-21"></p>
After :: Result:
<div>
<div id = "passageContent">
<p id="Id44">see the image and see the color... ?</p>
<div>
<div id="Id87" style="display:inline-block">
<video id="Id30" src="http://Id3.qa.cete.us/117973/video.mp4"></video>
</div>
<div id ="Id21">
<img id="img_44186" src="/129884/apple.jpg" />
</div>
</div>
或结果:
<div>
<p id="Id44">see the image and see the color... ?</p>
<div id="Id87" style="display:inline-block">
<video id="Id30" src="http://Id3.qa.cete.us/117973/video.mp4"></video>
</div>
<div id ="Id21">
<img id="img_44186" src="/129884/apple.jpg" />
</div>
</div>
答案 0 :(得分:1)
查看以下代码段:
public class Test {
public static void main(String[] args) {
try {
String html = "<p id=\"Id44\">see the image and see the color... ?</p>\r\n" + "<p id=\"Id40\"></p>\r\n"
+ "<div id=\"Id87\" style=\"display:inline-block\">\r\n"
+ "<video id=\"Id30\" src=\"http://Id3.qa.cete.us/117973/video.mp4\"></video>\r\n" + "</div>\r\n"
+ "<p id=\"Id28\"></p>\r\n" + "<p id=\"Id-1\"></p>\r\n" + "<div id =\"Id21\">\r\n"
+ "<img id=\"img_44186\" src=\"/129884/apple.jpg\" />\r\n" + "</div>\r\n" + "<p id=\"Id-320046-3-21\"></p>";
new Test().modifyMediaVariantContent(html);
} catch (Exception e) {
e.printStackTrace();
}
}
private void modifyMediaVariantContent(String html) {
org.jsoup.nodes.Document doc = Jsoup.parse(html);
for (org.jsoup.nodes.Element element : doc.getElementsByTag("p")) {
if (!element.hasText() && element.isBlock()) {
element.remove();
}
if (element.hasText() && element.parent() == doc.body()) {
Element replacment = new Element(Tag.valueOf("div"), "");
replacment.appendChild(element.clone());
element.replaceWith(replacment);
}
}
System.out.println(doc.body().html());
}
}
这输出以下内容:
<div>
<p id="Id44">see the image and see the color... ?</p>
</div>
<div id="Id87" style="display:inline-block">
<video id="Id30" src="http://Id3.qa.cete.us/117973/video.mp4"></video>
</div>
<div id="Id21">
<img id="img_44186" src="/129884/apple.jpg">
</div>
要将Jsoup文档转换为org.w3c.dom.Document
使用org.jsoup.helper.W3CDom
:
W3CDom w3cDom = new W3CDom();
org.w3c.dom.Document w3cDoc = w3cDom.fromJsoup(doc);
答案 1 :(得分:0)
您可以使用RegExp:
html.replaceAll("<p id=\".*\"></p>\n", "");