尝试使用jsoup

时间:2017-09-10 19:39:35

标签: java web-scraping jsoup

我正在尝试将数据输入网站。我会在此处发布网站的重要摘要,但可以找到目标网页here

值是街道地址编号和街道名称 inpNumberinpStreet

HTML:



<td width="48">
  <input type="text" id="inpNumber" name="inpNumber" class="Input" size="5" value="" onkeypress="clearAction(this)" />
</td>

<td width="40">
  <input type="text" id="inpUnit" name="inpUnit" class="Input" size="4" value="" onkeypress="clearAction(this)" />
</td>

<td width="160">
  <input type="text" id="inpStreet" name="inpStreet" class="Input" size="20" value="" onkeypress="clearAction(this)" />
</td>
&#13;
&#13;
&#13;

有效查询只需要inpStreetinpNumber,我需要输入这些值。

到目前为止我尝试了什么:

String url = "http://icare.fairfaxcounty.gov/ffxcare/search/commonsearch.aspx?mode=address";    
try {
    Connection.Response response = Jsoup.connect(url)
                .userAgent("Mozilla/5.0")
                .timeout(10 * 10000)
                .method(Connection.Method.POST)
                .data("inpNumber", "4127")
                .data("inpUnit", "")
                .data("inpStreet", "Winter Harbor")
                .data("btSearch", "")
                .data("inpSuffix1", "")
                .followRedirects(true)
                .execute();

    //parse the document from response
    Document document = response.parse();
    System.out.println(" extracting information from site ");

    FileWriter fw = new FileWriter("doc.html");
    BufferedWriter bw = new BufferedWriter(fw);
    bw.write(document.html());
    bw.close();
} catch (Exception ex){
    ex.printStackTrace();
}

我还尝试了上述代码的几种变体,包括更多/更少的键/对值(设置和返回&#34;&#34;通过查看firebug找到的值),查看所有返回值和一般更改致Jsoup.connect(url)电话。

我在doc.html文件中获得的结果是原始未更改的页面。我做错了什么?

1 个答案:

答案 0 :(得分:1)

信息作为有效负载发送,我用来发送信息的最佳方式是使用requestBody(String)。以下代码经过测试工作。

进口:

import java.io.BufferedWriter;
import java.io.FileWriter;

import org.jsoup.*;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

import static java.net.URLEncoder.encode;

代码:

public static void main(String[] args) {
    String url = "http://icare.fairfaxcounty.gov/ffxcare/search/commonsearch.aspx?mode=address";
    String userAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0";

    try {

        // GET required information for validation
        // Note that you might want to make a method out of this and call it whenever you need to instead of always
        Elements inputs = Jsoup.connect(url)
                .userAgent(userAgent)
                .get().select("input");

        String eventValidation = encode(inputs.select("#__EVENTVALIDATION").attr("value"), "UTF-8");
        String viewStateGen = encode(inputs.select("#__VIEWSTATEGENERATOR").attr("value"), "UTF-8");
        String viewState = encode(inputs.select("#__VIEWSTATE").attr("value"), "UTF-8");


        int number = 4127;
        String street = encode("Winter Harbor", "UTF-8");

        // not necessary
        String unit = "";
        String suffix = "";

        Document document = Jsoup.connect(url)
                .userAgent(userAgent)
                .requestBody(
                        String.format(
                                "mode=ADDRESS"
                                + "&__VIEWSTATE=%s"
                                + "&__VIEWSTATEGENERATOR=%s"
                                + "&__EVENTVALIDATION=%s"
                                + "&inpNumber=%d"
                                + "&inpUnit=%s"
                                + "&inpStreet=%s"
                                + "&inpSuffix1=%s", 
                                viewState, viewStateGen, eventValidation,
                                number, unit, street, suffix))
                .post();


        System.out.println("Extracting information from the site...");

        FileWriter fw = new FileWriter("doc.html");
        BufferedWriter bw = new BufferedWriter(fw);
        bw.write(document.html());
        bw.close();

        System.out.println("Done.");
    } catch (Exception ex) {
        //TODO Handle exceptions
        ex.printStackTrace();
    }

}