Question

我尝试通过访问以下网址从数字图书馆获取搜索结果：

http://search.lib.monash.edu/primo_library/libweb/action/search.do?dscnt=0&frbg=&scp.scps=scope%3A%2861MONASH_AU%29%2Cscope%3A%28catcarm%29%2Cscope%3A%28arrow%29%2Cscope%3A%28arrow%29%2Cscope%3A%28MUA%29%2Cscope%3A%28catau%29%2Cprimo_central_multiple_fe&tab=default_tab&dstmp=1397132268717&srt=rank&ct=search&mode=Basic&dum=true&indx=1&vl%28freeText0%29=java&fn=search&vid=MON

这个网址从任何网页浏览器都可以正常工作，但是，当我尝试从我的java应用程序中读取此URL时，它会返回此html文件，这似乎将应用程序重定向到另一个页面：

<!-- filename: sso -->
<html>
<head> 
<title>Login </title> 
<!-- START filename: meta-tags.pds --> 
<meta http-equiv="Cache-Control" content="no-cache" /> 
<meta http-equiv="Pragma" content="no-cache" /> 
<meta http-equiv="Expires" content="Sun, 06 Nov 1994 08:49:37 GMT" /> 
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 
<!-- END   filename: meta-tags.pds --> 
<link rel="stylesheet" href="http://monash-dc05.hosted.exlibrisgroup.com:8991/PDSMExlibris.css"   type="text/css" /> 
</head> 
<body onload="location = '/goto/http://search.lib.monash.edu:80/primo_library/libweb/action/login.do?afterPDS=true&amp;vid=MON&amp;vid=MON&amp;dscnt=0&amp;targetURL=http%3A%2F%2Fsearch.lib.monash.edu%2Fprimo_library%2Flibweb%2Faction%2Fsearch.do%3Fdscnt%3D0&amp;frbg=&amp;tab=default%5Ftab&amp;dstmp=1397132076758&amp;srt=rank&amp;ct=search&amp;mode=Basic&amp;dum=true&amp;indx=1&amp;tb=&amp;vl%28freeText0%29=java&amp;fn=search&amp;pds_handle=GUEST';"> 
 <noscript> 
 <div id="header"> 
 <div> 
 <img src="http://monash-dc05.hosted.exlibrisgroup.com:8991//exlibris/primo/p4_1/pds/html_form/icon/exlibrislogo.jpg" alt="Exlibris Logo" />
 <p>&nbsp;</p> 
 </div> 
 </div> 
 <div id="connect"> 
 <a href="/goto/http://search.lib.monash.edu:80/primo_library/libweb/action/login.do?afterPDS=true&amp;vid=MON&amp;vid=MON&amp;dscnt=0&amp;targetURL=http%3A%2F%2Fsearch.lib.monash.edu%2Fprimo_library%2Flibweb%2Faction%2Fsearch.do%3Fdscnt%3D0&amp;frbg=&amp;tab=default%5Ftab&amp;dstmp=1397132076758&amp;srt=rank&amp;ct=search&amp;mode=Basic&amp;dum=true&amp;indx=1&amp;tb=&amp;vl%28freeText0%29=java&amp;fn=search&amp;pds_handle=GUEST">Return from Check SSO </a> 
 </div>    
 </noscript>
 </body>
  </html>

我硬编码了我的应用程序重定向到的页面，代码很简单：

String url="http://search.lib.monash.edu:80/primo_library/libweb/action/login.do?afterPDS=true&amp;vid=MON&amp;vid=MON&amp;dscnt=0&amp;targetURL=http%3A%2F%2Fsearch.lib.monash.edu%2Fprimo_library%2Flibweb%2Faction%2Fsearch.do%3Fdscnt%3D0&amp;frbg=&amp;tab=default%5Ftab&amp;dstmp=1397132076758&amp;srt=rank&amp;ct=search&amp;mode=Basic&amp;dum=true&amp;indx=1&amp;tb=&amp;vl%28freeText0%29=java&amp;fn=search&amp;pds_handle=GUEST";
Document d=Jsoup.connect(url).timeout(60000).get();

应用程序重定向到的页面（在body onload中定义）不可用。

我的问题是我如何使用我的java应用程序从上面的URL获取html文件，就像我从浏览器中获取它一样？

此数字图书馆没有API或任何公开的服务，否则我会使用它们。

Answer 1

在最后一段代码中用&替换&（字符串url =＆＃34; ..）

从URL获取html

1 个答案: