为什么jsoup中的.select(“ div.class”)对于CNN.com不起作用?

时间:2019-05-16 11:43:16

标签: java html css jsoup

我一直在试图弄清为什么jsoup的.select(“ div.zn-body__paragraph”)尚未在某些CNN文章上进行。对于像this这样的文章,尽管有明显的标记,它仍然无法工作,而像this这样的文章却可以工作。这是我编写的完整代码:


    public static String getContentCNN(String link) throws IOException{
        String finalString = "";

        Elements paragraphs = getDocsCNN(link).select("div.zn-body__paragraph");

        for (Element p : paragraphs) {
            finalString += p.text() + "\n\n";
        }


        return finalString;
    }

它们都有这样的分隔器类:


<div class="zn-body__paragraph">Nadler on Wednesday said he didn't know the White House's motives, but he would not allow the White House to try to claim that the President cannot be held accountable.</div>

<div class="zn-body__paragraph">"I don't know whether they're trying to taunt us toward an impeachment or anything else," Nadler said. "All I know is they have made a preposterous claim."</div>

到目前为止,我已经尝试过div#class,div [class]和getElementByClass(“ class”)

谢谢。

编辑:这是getDocsCNN()的源代码:


public static Document getDocsCNN(String link) throws IOException{

        return Jsoup.connect(link).userAgent("Mozilla").timeout(6000).get();

    }

0 个答案:

没有答案