我的Java应用尝试阅读以下网址中的内容:https://www.iplocation.net/?query=62.92.63.48
我使用了以下方法:
StringBuffer readFromUrl(String Url)
{
StringBuffer sb=new StringBuffer();
BufferedReader in=null;
try
{
in=new BufferedReader(new InputStreamReader(new URL(Url).openStream()));
String inputLine;
while ((inputLine=in.readLine()) != null) sb.append(inputLine+"\n");
in.close();
}
catch (Exception e) { e.printStackTrace(); }
finally
{
try
{
if (in!=null)
{
in.close();
in=null;
}
}
catch (Exception ex) { ex.printStackTrace(); }
}
return sb;
}
通常它适用于其他网址,但对于这个网址,结果与浏览器中显示的结果不同,它看起来像这样:
<html>
<head>
<META NAME="robots" CONTENT="noindex,nofollow">
<script>
(function(){function getSessionCookies(){var cookieArray=new Array();var cName=/^\s?incap_ses_/;var c=document.cookie.split(";");for(var i=0;i<c.length;i++){var key=c[i].substr(0,c[i].indexOf("="));var value=c[i].substr(c[i].indexOf("=")+1,c[i].length);if(cName.test(key)){cookieArray[cookieArray.length]=value}}return cookieArray}function setIncapCookie(vArray){var res;try{var cookies=getSessionCookies();var digests=new Array(cookies.length);for(var i=0;i<cookies.length;i++){digests[i]=simpleDigest((vArray)+cookies[i])}res=vArray+",digest="+(digests.join())}catch(e){res=vArray+",digest="+(encodeURIComponent(e.toString()))}createCookie("___utmvc",res,20)}function simpleDigest(mystr){var res=0;for(var i=0;i<mystr.length;i++){res+=mystr.charCodeAt(i)}return res}function createCookie(name,value,seconds){var expires="";if(seconds){var date=new Date();date.setTime(date.getTime()+(seconds*1000));var expires="; expires="+date.toGMTString()}document.cookie=name+"="+value+expires+"; path=/"}function test(o){var res="";var vArray=new Array();for(var j=0;j<o.length;j++){var test=o[j][0];switch(o[j][1]){case"exists":try{if(typeof(eval(test))!="undefined"){vArray[vArray.length]=encodeURIComponent(test+"=true")}else{vArray[vArray.length]=encodeURIComponent(test+"=false")}}catch(e){vArray[vArray.length]=encodeURIComponent(test+"=false")}break;case"value":try{try{res=eval(test);if(typeof(res)==="undefined"){vArray[vArray.length]=encodeURIComponent(test+"=undefined")}else if(res===null){vArray[vArray.length]=encodeURIComponent(test+"=null")}else{vArray[vArray.length]=encodeURIComponent(test+"="+res.toString())}}catch(e){vArray[vArray.length]=encodeURIComponent(test+"=cannot evaluate");break}break}catch(e){vArray[vArray.length]=encodeURIComponent(test+"="+e)}case"plugin_extentions":try{var extentions=[];try{i=extentions.indexOf("i")}catch(e){vArray[vArray.length]=encodeURIComponent("plugin_ext=indexOf is not a function");break}try{var num=navigator.plugins.length if(num==0||num==null){vArray[vArray.length]=encodeURIComponent("plugin_ext=no plugins");break}}catch(e){vArray[vArray.length]=encodeURIComponent("plugin_ext=cannot evaluate");break}for(var i=0;i<navigator.plugins.length;i++){if(typeof(navigator.plugins[i])=="undefined"){vArray[vArray.length]=encodeURIComponent("plugin_ext=plugins[i] is undefined");break}var filename=navigator.plugins[i].filename var ext="no extention";if(typeof(filename)=="undefined"){ext="filename is undefined"}else if(filename.split(".").length>1){ext=filename.split('.').pop()}if(extentions.indexOf(ext)<0){extentions.push(ext)}}for(i=0;i<extentions.length;i++){vArray[vArray.length]=encodeURIComponent("plugin_ext="+extentions[i])}}catch(e){vArray[vArray.length]=encodeURIComponent("plugin_ext="+e)}break}}vArray=vArray.join();return vArray}var o=[["navigator","exists"],["navigator.vendor","value"],["navigator.appName","value"],["navigator.plugins.length==0","value"],["navigator.platform","value"],["navigator.webdriver","value"],["platform","plugin_extentions"],["ActiveXObject","exists"],["webkitURL","exists"],["_phantom","exists"],["callPhantom","exists"],["chrome","exists"],["yandex","exists"],["opera","exists"],["opr","exists"],["safari","exists"],["awesomium","exists"],["puffinDevice","exists"],["navigator.cpuClass","exists"],["navigator.oscpu","exists"],["navigator.connection","exists"],["window.outerWidth==0","value"],["window.outerHeight==0","value"],["window.WebGLRenderingContext","exists"],["document.documentMode","value"],["eval.toString().length","value"]];try{setIncapCookie(test(o));document.createElement("img").src="/_Incapsula_Resource?SWKMTFSR=1&e="+Math.random()}catch(e){img=document.createElement("img");img.src="/_Incapsula_Resource?SWKMTFSR=1&e="+e}})();
</script>
<script>
(function() {
var z="";var b="7472797B766172207868723B76617220743D6E6577204461746528292E67657454696D6528293B766172207374617475733D2273746128......6F6465555249436F6D706F6E656E74287374617475732B222028222B74696D696E672E6A6F696E28292B222922297D3B";for (var i=0;i<b.length;i+=2){z=z+parseInt(b.substring(i, i+2), 16)+",";}z = z.substring(0,z.length-1); eval(eval('String.fromCharCode('+z+')'));})();
</script></head>
<body>
<iframe style="display:none;visibility:hidden;" src="//content.incapsula.com/jsTest.html" id="gaIframe"></iframe>
</body></html>
那么在这种情况下,阅读浏览器中显示的html内容的正确方法是什么?
编辑:阅读建议后,我已将程序更新为如下所示:
StringBuilder response=new StringBuilder();
String USER_AGENT="Mozilla/5.0",inputLine;
BufferedReader in=null;
try
{
HttpURLConnection con=(HttpURLConnection)new URL(Url).openConnection();
con.setRequestMethod("GET");
con.setRequestProperty("Accept-Charset","UTF-8");
con.setRequestProperty("User-Agent",USER_AGENT); // Add request header
int responseCode=con.getResponseCode();
in=new BufferedReader(new InputStreamReader(con.getInputStream()));
while ((inputLine=in.readLine())!=null) { response.append(inputLine); }
in.close();
}
catch (Exception e) { e.printStackTrace(); }
finally
{
try { if (in!=null) in.close(); }
catch (Exception ex) { ex.printStackTrace(); }
}
return response.toString();
但仍然没有奏效,我得到的反应如下:
<html style="height:100%"><head><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"><meta name="format-detection" content="telephone=no"><meta name="viewport" content="initial-scale=1.0"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"></head><body style="margin:0px;height:100%"><iframe src="/_Incapsula_Resource?CWUDNSAI=24&xinfo=8-75933493-0 0NNN RT(1479758027223 127) q(0 -1 -1 -1) r(0 -1) B12(4,315,0) U10000&incident_id=516000100118713619-514529209419563176&edet=12&cinfo=04000000" frameborder=0 width="100%" height="100%" marginheight="0px" marginwidth="0px">Request unsuccessful. Incapsula incident ID: 516000100118713619-514529209419563176</iframe></body></html>
有人可以展示一些有效的示例代码吗?
感谢@thatguy我已将程序修改为如下所示:
import java.util.*;
import java.util.concurrent.*;
import java.io.*;
import java.net.*;
import java.util.Map.Entry;
public class Read_From_Url_Runner implements Callable<String[]>
{
int Id;
String Read_From_Url_Result[]=null,IP_Location_Url="https://www.iplocation.net/?query=[IP]",IP="62.92.63.48",Cookie,Result[],A_Url;
public Read_From_Url_Runner(int Id)
{
this.Id=Id;
A_Url=IP_Location_Url.replace("[IP]",IP);
Cookie=getIncapsulaCookie(A_Url);
Out("Cookie = [ "+Cookie+" ]");
try
{
Result=call();
// for (int i=0;i<Result.length;i++) Out(Result[i]);
}
catch (Exception e) { e.printStackTrace(); }
}
public String[] call() throws InterruptedException
{
String Text;
try
{
Text=readUrl(A_Url,Cookie);
Out(Text);
}
catch (Exception e)
{
Out(" --> Error in data : IP = "+IP);
// e.printStackTrace();
}
return Read_From_Url_Result;
}
public static String readUrl(String url,String incapsulaCookie)
{
StringBuilder response=new StringBuilder();
String USER_AGENT="Mozilla/5.0",inputLine;
BufferedReader in=null;
try
{
HttpURLConnection connection=(HttpURLConnection)new URL(url).openConnection();
connection.setRequestMethod("GET");
connection.setRequestProperty("Accept","text/html; charset=UTF-8");
connection.setRequestProperty("User-Agent",USER_AGENT);
connection.setDoInput(true);
connection.setDoOutput(true);
connection.setRequestProperty("Cookie",incapsulaCookie); // Set the Incapsula cookie
Out(connection.getRequestProperty("Cookie"));
in=new BufferedReader(new InputStreamReader(connection.getInputStream()));
while ((inputLine=in.readLine())!=null) { response.append(inputLine+"\n"); }
in.close();
}
catch (Exception e) { e.printStackTrace(); }
finally
{
try { if (in!=null) in.close(); }
catch (Exception ex) { ex.printStackTrace(); }
}
return response.toString();
}
public static String getIncapsulaCookie(String url)
{
String USER_AGENT="Mozilla/5.0",incapsulaCookie=null,visid=null,incap=null; // Cookies for Incapsula, preserve order
BufferedReader in=null;
try
{
HttpURLConnection cookieConnection=(HttpURLConnection)new URL(url).openConnection();
cookieConnection.setRequestMethod("GET");
cookieConnection.setRequestProperty("Accept","text/html; charset=UTF-8");
cookieConnection.setRequestProperty("User-Agent",USER_AGENT);
cookieConnection.connect();
for (Entry<String,List<String>> header : cookieConnection.getHeaderFields().entrySet())
{
if (header.getKey()!=null && header.getKey().equals("Set-Cookie")) // Incapsula gives you the required cookies
{
for (String cookieValue : header.getValue()) // Search for the desired cookies
{
if (cookieValue.contains("visid")) visid=cookieValue.substring(0,cookieValue.indexOf(";")+1);
if (cookieValue.contains("incap_ses")) incap=cookieValue.substring(0,cookieValue.indexOf(";"));
}
}
}
incapsulaCookie=visid+" "+incap;
cookieConnection.disconnect();
}
catch (Exception e) { e.printStackTrace(); }
finally
{
try { if (in!=null) in.close(); }
catch (Exception ex) { ex.printStackTrace(); }
}
return incapsulaCookie;
}
private static void out(String message) { System.out.print(message); }
private static void Out(String message) { System.out.println(message); }
public static void main(String[] args)
{
final Read_From_Url_Runner demo=new Read_From_Url_Runner(0);
}
}
但这只得到了响应的第一部分,如下所示:
我真正想要的是以下内容:
运行我的程序获得了此结果答案 0 :(得分:3)
您遇到的问题可能主要是 HTTP请求标头,您未明确设置。网站通常以不同的表示形式提供,具体取决于HTTP标头(和有效负载)中的属性,以便以适当的方式为桌面或移动客户端提供服务。关于您的代码,您没有设置任何内容,因此无论库设置如何,您都会发送默认标头。如果您检查浏览器正在发送的具体HTTP标头,则很可能存在差异(如用户代理或编码,......)。如果在代码中重建标题,结果应该相同。
此外,您可以使用HttpUrlConnection
,因此您可以轻松设置或读取相应的HTTP标头,例如在this SO帖子中。否则,对于URLConnection
,请查看here。
进一步调查
您的方法会反复出现一个特殊的错误页面,表明该网站使用了来自 Incapsula 的其他安全功能。你得到的网站看起来像这样:
当我调查标题时,我注意到需要存在两个cookie字符串,因此您可以直接访问网站,而不是安全检查:
visid_incap_...=...
incap_ses_..._...=...
您可以执行的操作是使用单个请求登录错误页面,这会在Set-Cookie
标头中为您提供两个Cookie字符串。然后,您可以直接向网站请求Cookie字符串设置为visid_incap_...=...; incap_ses_..._...=...
。您可以多次执行请求,直到cookie过期。只需检查错误页面即可检测到。这是工作代码,显然缺少样式和额外的检查,但解决了您的问题。其余的由你决定。
public static String getIncapsulaCookie(String url) {
String USER_AGENT = "Mozilla/5.0";
BufferedReader in = null;
String incapsulaCookie = null;
try {
HttpURLConnection cookieConnection =
(HttpURLConnection) new URL(url).openConnection();
cookieConnection.setRequestMethod("GET");
cookieConnection.setRequestProperty("Accept",
"text/html; charset=UTF-8");
cookieConnection.setRequestProperty("User-Agent", USER_AGENT);
// Disable 'keep-alive'
cookieConnection.setRequestProperty("Connection", "close");
// Cookies for Incapsula, preserve order
String visid = null;
String incap = null;
cookieConnection.connect();
for (Entry<String, List<String>> header : cookieConnection
.getHeaderFields().entrySet()) {
// Incapsula gives you the required cookies
if (header.getKey() != null
&& header.getKey().equals("Set-Cookie")) {
// Search for the desired cookies
for (String cookieValue : header.getValue()) {
if (cookieValue.contains("visid")) {
visid = cookieValue.substring(0,
cookieValue.indexOf(";") + 1);
}
if (cookieValue.contains("incap_ses")) {
incap = cookieValue.substring(0,
cookieValue.indexOf(";"));
}
}
}
}
incapsulaCookie = visid + " " + incap;
// Explicitly disconnect, also essential in this method!
cookieConnection.disconnect();
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
if (in != null)
in.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
return incapsulaCookie;
}
此方法为您提取封装cookie。以下是您的方法的修改版本,它使用cookie:
public static String readUrl(String url, String incapsulaCookie) {
StringBuilder response = new StringBuilder();
String USER_AGENT = "Mozilla/5.0", inputLine;
BufferedReader in = null;
try {
HttpURLConnection connection =
(HttpURLConnection) new URL(url).openConnection();
connection.setRequestMethod("GET");
connection.setRequestProperty("Accept", "text/html; charset=UTF-8");
connection.setRequestProperty("User-Agent", USER_AGENT);
// Set the Incapsula cookie
connection.setRequestProperty("Cookie", incapsulaCookie);
in = new BufferedReader(
new InputStreamReader(connection.getInputStream()));
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
if (in != null)
in.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
return response.toString();
}
正如我所观察到的,用户代理和其他属性似乎并不重要。您现在可以拨打getIncapsulaCookie(String url)
一次或在需要新Cookie时,获取Cookie并readUrl(String url, String incapsulaCookie)
多次来请求该网页,直到Cookie过期为止。结果是完整 HTML页面,如此部分图片中所示:
重要细节:getIncapsulaCookie(...)
方法中有两个基本命令,即cookieConnection.setRequestProperty("Connection", "close");
和cookieConnection.disconnect();
。如果您想立即致电readUrl(...)
,则两者都必需。如果省略这些命令,收到cookie后,服务器端的HTTP连接将保持活动状态,下一次调用readUrl(...)
将向您返回错误的页面。您可以通过省略这些命令来尝试此操作,而是拨打getIncapsulaCookie(...)
,然后等待5到65秒并致电readUrl(...)
。您将看到这也有效,因为连接会自动超时。另请参阅here。