使用Java从String变量中删除空p标记?

时间:2017-01-06 15:48:16

标签: java jsoup

我正在尝试删除不包含任何文字的p代码。如果任何p标记包含文本,但不包含任何父标记,所以我尝试创建父标记DIV 。我在尝试着 从 org.jsoup.nodes.Document 转换为 org.w3c.dom.Document。

是否有可能或任何捷径解决方案?

Java代码:

private void modifyMediaVariantContent(String html) {

    org.jsoup.nodes.Document doc = Jsoup.parse(html);

    for (org.jsoup.nodes.Element element : doc.select("*")) {
        if (!element.hasText() && element.isBlock()) {
            element.remove();
        }
    }
}

HTML字符串值:

在:

<p id="Id44">see the image and see the color... ?</p>
<p id="Id40"></p>
<div id="Id87" style="display:inline-block">
<video id="Id30" src="http://Id3.qa.cete.us/117973/video.mp4"></video>
</div>
<p id="Id28"></p>
<p id="Id-1"></p>
<div id ="Id21">
<img id="img_44186" src="/129884/apple.jpg" />
</div>
<p id="Id-320046-3-21"></p>

After :: Result:

<div>
<div id = "passageContent">
<p id="Id44">see the image and see the color... ?</p>
<div>
<div id="Id87" style="display:inline-block">
<video id="Id30" src="http://Id3.qa.cete.us/117973/video.mp4"></video>
</div>
<div id ="Id21">
<img id="img_44186" src="/129884/apple.jpg" />
</div>
</div>

或结果:

<div>
<p id="Id44">see the image and see the color... ?</p>
<div id="Id87" style="display:inline-block">
<video id="Id30" src="http://Id3.qa.cete.us/117973/video.mp4"></video>
</div>
<div id ="Id21">
<img id="img_44186" src="/129884/apple.jpg" />
</div>
</div>

2 个答案:

答案 0 :(得分:1)

查看以下代码段:

public class Test {

    public static void main(String[] args) {
        try {
            String html = "<p id=\"Id44\">see the image and see the color... ?</p>\r\n" + "<p id=\"Id40\"></p>\r\n"
                    + "<div id=\"Id87\" style=\"display:inline-block\">\r\n"
                    + "<video id=\"Id30\" src=\"http://Id3.qa.cete.us/117973/video.mp4\"></video>\r\n" + "</div>\r\n"
                    + "<p id=\"Id28\"></p>\r\n" + "<p id=\"Id-1\"></p>\r\n" + "<div id =\"Id21\">\r\n"
                    + "<img id=\"img_44186\" src=\"/129884/apple.jpg\" />\r\n" + "</div>\r\n" + "<p id=\"Id-320046-3-21\"></p>";
            new Test().modifyMediaVariantContent(html);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    private void modifyMediaVariantContent(String html) {
        org.jsoup.nodes.Document doc = Jsoup.parse(html);
        for (org.jsoup.nodes.Element element : doc.getElementsByTag("p")) {
            if (!element.hasText() && element.isBlock()) {
                element.remove();
            }
            if (element.hasText() && element.parent() == doc.body()) {
                Element replacment = new Element(Tag.valueOf("div"), "");
                replacment.appendChild(element.clone());
                element.replaceWith(replacment);
            }
        }

        System.out.println(doc.body().html());
    }
}

这输出以下内容:

<div>
 <p id="Id44">see the image and see the color... ?</p>
</div>  
<div id="Id87" style="display:inline-block"> 
 <video id="Id30" src="http://Id3.qa.cete.us/117973/video.mp4"></video> 
</div>   
<div id="Id21"> 
 <img id="img_44186" src="/129884/apple.jpg"> 
</div>

要将Jsoup文档转换为org.w3c.dom.Document使用org.jsoup.helper.W3CDom

W3CDom w3cDom = new W3CDom();
org.w3c.dom.Document w3cDoc = w3cDom.fromJsoup(doc);

答案 1 :(得分:0)

您可以使用RegExp:

html.replaceAll("<p id=\".*\"></p>\n", "");