我想从http://www.wettportal.com/quotenarchiv/中提取一些数据。
有一个javascript表单: Search
<form id="archivesearchform" name="archivesearchform" method="post" action="">
...
<td class="ralign">Sportart:</td>
<td>
<select name="sport_id" id="sport_id" style="width:100%">
...
<td class="ralign">Land:</td>
<td>
<select name="region_id" id="region_id" style="width:100%;">
...
<td class="ralign">Liga:</td>
<td>
<select name="league_id" id="league_id" style="width:100%">
...
<td class="ralign">vom:</td>
<td>
<input type="text" name="fromdate" id="fromdate" style="width:100%" />
...
<td class="ralign">vom:</td>
<td>
<input type="text" name="fromdate" id="fromdate" style="width:100%" />
</td>
<td class="ralign">bis:</td>
<td>
<input type="text" name="tilldate" id="tilldate" style="width:100%" />
</td>
<td colspan="2"></td>
</tr>
<tr>
<td class="ralign">Teilnehmer:</td>
<td colspan="3"><input type="text" name="team" style="width:100%" /></td>
<td colspan="2"></td>
</tr>
</tbody>
和提交按钮:
<tr>
<td class="lalign"></td>
<td class="calign"><input type="submit" name="btnSubmit" value="Suchen" /></td>
<td class="ralign"><div class="loading-animation" id="div_loading"></div></td>
</tr>
我尝试使用此代码:
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class QAJesoupE {
public static void main(String[] args) {
try {
Document doc = Jsoup.connect("http://www.wettportal.com/quotenarchiv/")
.data("sport_id", "4")
.data("region_id", "16")
.data("league_id", "0")
.data("fromdate", "")
.data("tilldate", "")
.data("team", "")
// and other hidden fields which are being passed in post request.
.userAgent("Mozilla")
.post();
System.out.println(doc); // will print html source of homepage of facebook.
} catch (IOException e) {
e.printStackTrace();
}
}
}
但我只得到没有任何搜索结果的HTML代码。 : - /
可以请任何人帮助我吗?
提前多多感谢!
答案 0 :(得分:2)
此网站上有一个处理表单提交的脚本。即使form
元素定义POST
,脚本实际上也会发送get
请求,并将数据作为网址参数:
http://www.wettportal.com/lib/ajax/getArchivedEvents.php?partner=wettportal&lang=de&sport_id=4®ion_id=23&league_id=0&fromdate=&tilldate=&team=
Jsoup 会为您创建请求网址(带参数),但您必须发送GET
请求并包含X-Requested-With
标头(见下文):
Document doc = Jsoup
.connect("http://www.wettportal.com/lib/ajax/getArchivedEvents.php")
.data("sport_id", "4")
.data("region_id", "16")
.data("league_id", "0")
.data("fromdate", "")
.data("tilldate", "")
.data("team", "")
.header("X-Requested-With", "XMLHttpRequest")
.timeout(10000)
.get();