无法弄清楚如何刮取特定文本 - 使用Jsoup

时间:2017-04-06 08:45:09

标签: java web web-scraping jsoup

我刚开始学习如何使用JSoup。我想我已经成功选择了html的这一部分,并且通过做.select(“span.title”)。文本成功地拿出了“DARK SOULS III Deluxe Edition”。但是我试图得到价格,在这种情况下84.98美元和55.23美元。我尝试过.select(“div.col search_price responsive_secondrow”)。文本但是它显示为空白。我想知道是否有人可以帮助我弄清楚如何提取该部分,提前感谢!这是页面部分的html。

完整的html是view-source:http://store.steampowered.com/search/?filter=topsellers

<a href="http://store.steampowered.com/sub/94174/?snr=1_7_7_topsellers_150_1"  data-ds-packageid="94174" data-ds-appid="374320,442010"onmouseover="GameHover( this, event, 'global_hover', {&quot;type&quot;:&quot;sub&quot;,&quot;id&quot;:94174,&quot;public&quot;:1,&quot;v6&quot;:1} );" onmouseout="HideGameHover( this, event, 'global_hover' )" class="search_result_row ds_collapse_flag" >
                <div class="col search_capsule"><img src="http://cdn.edgecast.steamstatic.com/steam/subs/94174/capsule_sm_120.jpg?t=1476893662"></div>
                <div class="responsive_search_name_combined">
                    <div class="col search_name ellipsis">
                        <span class="title">DARK SOULS III Deluxe Edition</span>
                        <p>
                            <span class="platform_img win"></span>                          </p>
                    </div>
                    <div class="col search_released responsive_secondrow">12 Apr, 2016</div>
                    <div class="col search_reviewscore responsive_secondrow">
                                                        <span class="search_review_summary positive" data-store-tooltip="Very Positive&lt;br&gt;86% of the 29,204 user reviews for games in this bundle are positive.">
                            </span>
                                                </div>


                    <div class="col search_price_discount_combined responsive_secondrow">
                        <div class="col search_discount responsive_secondrow">
                            <span>-35%</span>
                        </div>
                        <div class="col search_price discounted responsive_secondrow">
                            <span style="color: #888888;"><strike>$84.98</strike></span><br>$55.23                          </div>
                    </div>
                </div>


                <div style="clear: left;"></div>
            </a>

1 个答案:

答案 0 :(得分:0)

使用doc.select(“a.search_result_row”)代替:

public class JsoupSteamTest {

    public static void main(String[] args) throws IOException {

        Document doc = Jsoup.connect("http://store.steampowered.com/search/?filter=topsellers").userAgent("Mozilla")
                .get();

        Elements table = doc.select("a.search_result_row");

        Iterator<Element> ite = table.iterator();
        while (ite.hasNext()) {
            Element element = ite.next();
            System.out.println(element.text());

        }
    }
}

你会得到一个这样的清单:

PLAYERUNKNOWN'S BATTLEGROUNDS 23 Mar, 2017 29,99€
Steel Division: Normandy 44 Coming Soon 39,99€
DARK SOULS™ III 11 Apr, 2016 -50% 59,99€ 29,99€

您的特定问题来自具有多个类的div。

要选择具有多个类的元素,请在选择中使用点而不是空格:

doc.select("div.col.search_price.discounted.responsive_secondrow");

看看这个问题:JSOUP get element with multiple classes