Question

我一直在试图弄清为什么jsoup的.select（“ div.zn-body__paragraph”）尚未在某些CNN文章上进行。对于像this这样的文章，尽管有明显的标记，它仍然无法工作，而像this这样的文章却可以工作。这是我编写的完整代码：


    public static String getContentCNN(String link) throws IOException{
        String finalString = "";

        Elements paragraphs = getDocsCNN(link).select("div.zn-body__paragraph");

        for (Element p : paragraphs) {
            finalString += p.text() + "\n\n";
        }


        return finalString;
    }

它们都有这样的分隔器类：


<div class="zn-body__paragraph">Nadler on Wednesday said he didn't know the White House's motives, but he would not allow the White House to try to claim that the President cannot be held accountable.</div>

<div class="zn-body__paragraph">"I don't know whether they're trying to taunt us toward an impeachment or anything else," Nadler said. "All I know is they have made a preposterous claim."</div>

到目前为止，我已经尝试过div＃class，div [class]和getElementByClass（“ class”）

谢谢。

编辑：这是getDocsCNN（）的源代码：


public static Document getDocsCNN(String link) throws IOException{

        return Jsoup.connect(link).userAgent("Mozilla").timeout(6000).get();

    }

为什么jsoup中的.select（“ div.class”）对于CNN.com不起作用？

0 个答案: