我正在尝试使用 jsoup 获取此页面:http://poalimparents.bankhapoalim.co.il/
但我得到的只是:
<html>
<head>
<meta charset="utf-8">
</head>
<body>
<script>window.rbzns = {fiftyeightkb: 43200000, days_in_week : 1};</script>
<script src="//d1a702rd0dylue.cloudfront.net/js/sugarman/v7/flat.js"></script>
<script>rbzns.challdomain="poalimparents.bankhapoalim.co.il"; rbzns.ctrbg="FBJbOCP+Rehoy7Oy/WdwW78giok75ZJ41qiRAeMY6ngbkLDEoRQiaRnij/E1vDpJr8bXfF2RriK5XaIq/Hp55vlAaMCPBIVryBF/YYXoti09rQmZeDa16289c+L2T8eFOCCIjmmtSn7gp75lWrKDHxJgS7Te/RxMGL/93TjdGxpofgMceO/Z2y/d7oCYNO/HKn4ZciE4aqCU8AU6rtyVjH0HxWz47/pps9uqcV0VnR/up4gHLztME+GHfJzjZ80Vy/14g5wvCKRtZU7P6I3zgQ==";rbzns.rbzreqid="bnhp-rbzr0131343537323737333137e0b31050bf436236"; winsocks(true);</script>
</body>
</html>
我没有尝试在页面中获取脚本标记。
为什么我没有获得所有其他标签?我怎样才能获得这种页面?
非常感谢。
答案 0 :(得分:0)
尝试添加浏览器会发送的一些标题。也可能需要一个cookie - 您可以使用&#34; normal&#34;浏览器。
示例(卷曲):
curl "http://poalimparents.bankhapoalim.co.il/" -H "Host: poalimparents.bankhapoalim.co.il" -H "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" -H "Accept-Language: de,en-US;q=0.7,en;q=0.3" --compressed -H "DNT: 1" -H "Cookie: rbzreqid=bnhp-rbzr0231343537323737353630894b901088a07d30; rbzid=YnVIYjc0K2RmTDFXZFdiZy95UTNrOUJ0NEp0MzgzbW9oNlhHcFdVMU1mR25oT1NGQXowdGdZbjR3WktuQ2ZBZ0ljUTVCbHFDK213bGZXRk1DekJMMnhQMVFZVG4zcHpNT1lEWTRpM3FVeiswbkxtaFVCR25CU0taQ2pnNU1IQ1Z5WDc2ZWgxa2ZxR25vR1JadFpTVThidWs0d0s5QUF2YUVRSG1QcUpsS1ltRGpYNzhPR0lpRDNkak1VRmVxdm5nY0RpM1dEUnYrWU1rR2R1c3pWY2JGZHd6ODlOdkxHUkxuOW03N0VzWC9oOD1AQEAwQEBALTc0MDc0MDczNDA-" -H "Connection: keep-alive" -H "If-Modified-Since: Mon, 22 Feb 2016 11:40:18 GMT" -H "If-None-Match: W/""af3920fb581df1e4de68d46a4694689a""" -H "Cache-Control: max-age=0"