jsoup搜索表单结果解析

时间:2014-08-21 09:44:04

标签: jquery parsing http html-parsing jsoup

我想将结果页面放到doc中,但是我进入了doc starturl页面

doc=Jsoup.connect(startUrl).data("search_text", search)
                            .data("charset", "utf-8")
                            .data("top-search-button", "submit")
                            .timeout(0)
                            .post();

搜索表单html:

<div class="b-top-search">
    <form method="post" action="http://startUrl/search/" id="globalSearch" name="globalSearch">
        <div class="b-top-search-box">
            <i class="icon top-search-spinner"></i>
            <input type="text" class="top-search-input unfocus" value="Insert search text" autocomplete="off" id="g-search-input" name="search_text" longdesc="Insert search text">
            <button class="top-search-button" type="submit"><span>Find</span></button>
            <input type="hidden" name="charset" value="utf-8">
        </div>

        <!--Top-search-results-->
        <div class="b-top-search-results" id="g-search-result">
        <ul class="b-top-search-results__list"></ul>
        </div>
        <!--/Top-search-results-->
    </form>
</div>

1 个答案:

答案 0 :(得分:0)

由于data("top-search-button", "submit")是元素的top-search-button而不是class,因此不需要name。它没有name,因此它不期望任何值。也许服务器期待一些关于客户端的元数据(用户代理和引用标头)。试试这个

doc=Jsoup.connect(startUrl).userAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0")  
                           .referrer("http://www.google.com")   
                           .data("search_text", search)
                           .data("charset", "utf-8")
                           .timeout(0)
                           .post();