Jsoup从div

时间:2017-05-18 02:12:05

标签: java jsoup

在页面中有同一个类的多个div,如下所示:

<div class="author-quote">
    <a href="#">Maldives</a>
</div>

每个div都有<a>标记,<a>标记内的文字不同。

现在在我的Java方法中:

private String get() throws InterruptedException{
        final CountDownLatch latch = new CountDownLatch(1);
        final List<String> value = new ArrayList<>();

        Thread thread = new Thread(new Runnable() {
            Elements elements;
            @Override
            public void run() {
                try {
                    Document doc = Jsoup.connect(WEB_URL).get();

                    elements = doc.select("div.author-quote");
                    value.add(elements.text()); // added the whole class
                    latch.countDown();
                    } catch (IOException e) {
                    Log.e(TAG,e.getMessage());
                }
            }// end run
        });

        thread.start();
        latch.await();
        return value.get(0);
    }

它从类author-quote的div中获取所有文本。这是输出:

Pakistan Maldives Lichtenstein China Panama

但我只想要其中一个,一个随机的。我该怎么做?

其他信息:某些<a>标签包含爱尔兰共和国和几内亚比绍等多字词,有些还有多米尼加共和国等符号。

更新:我可以使用一些字符串操作来分隔它们。但我希望我能用Jsoup的元素选择工具来做到这一点。

2 个答案:

答案 0 :(得分:0)

您可以将value.get(0)拆分为分隔符" "(从而产生一个字符串数组),然后使用http://jsfiddle.net/9fR23/464/索引数组并选择一个随机String对象

...
...
...
String[] tempArray = value.get(0).split(" ");
return tempArray[new Random().nextInt(tempArray.length)];

<强>更新

根据您的帖子更新,另一个替代解决方案是从网页侧,您可以在一个主class元素中包围author-quote属性为div的所有div (如果尚未完成),则选择此父div,这样您就可以迭代parentDiv ChildNodes并单独收集其文本,然后添加到ArrayList

private String get() throws InterruptedException{
        final CountDownLatch latch = new CountDownLatch(1);
        final List<String> value = new ArrayList<>();

        Thread thread = new Thread(new Runnable() {
        Element parentDiv;
        @Override
        public void run() {
            try {
                Document doc = Jsoup.connect(WEB_URL).get();

                parentDiv = //getTheParentDivSomehow()
                for (Node child : parentDiv.childNodes()) {
                     Node tempNode = child.childNodes().get(0);
                     if (tempNode.childNodes().get(0) instanceof TextNode) 
                           value.add(((TextNode) child).text());

                }
                latch.countDown();
                } catch (IOException e) {
                Log.e(TAG,e.getMessage());
            }
        }// end run
        });

        thread.start();
        latch.await();
        Collections.shuffle(value);
        return value.get(new Random().nextInt(value.size()));
}

答案 1 :(得分:0)

private String get() throws InterruptedException{
    final CountDownLatch latch = new CountDownLatch(1);
    final List<String> value = new ArrayList<>();

    Thread thread = new Thread(new Runnable() {
        Elements elements;
        @Override
        public void run() {
            try {
                Document doc = Jsoup.connect(WEB_URL).get();
                //returns a list of Elements
                elements = doc.select("div.author-quote"); 
                //use a random number and get the Element at that index to select one div element at random
                Random random = new Random();
                int randomIndex = random.nextInt(elements.size());//returns a random number between 0 and elements.size()
                value.add(elements.get(randomIndex).text()); // add only the text of the random Element
                latch.countDown();
                } catch (IOException e) {
                Log.e(TAG,e.getMessage());
            }
        }// end run
    });

    thread.start();
    latch.await();
    return value.get(0);
}