我有这样的困难,对于某些网址,BufferedReader
到达Connection timeout
并抛出一个中断整个程序的异常。我需要的是检查连接打开的时间以及是否达到阈值(必须小于超时的阈值),它会跳过该URL以打开流然后获取下一个URL或者它以不会导致程序停止的方式处理超时。有什么想法怎么做?
URL url = new URL(line);
URLConnection connection = url.openConnection();
if (connection instanceof HttpURLConnection) {
HttpURLConnection httpConn = (HttpURLConnection) connection;
int statusCode = httpConn.getResponseCode();
if (statusCode <= 200 && statusCode < 300)
try{
BufferedReader brURL = new BufferedReader(new InputStreamReader(url.openStream()));
while((tempLine = brURL.readLine())!=null){
UrlMatcher=UrlPattern.matcher(tempLine);
java.util.logging.Logger.getLogger(SimpleCrawler.class.getName()).log(Level.SEVERE, tempLine);
if(UrlMatcher.find())
{
String resultURL=UrlMatcher.group();
fop.write(resultURL.toLowerCase().getBytes());
fop.write(System.getProperty("line.separator").getBytes());
System.out.println(resultURL);
}
}
}
catch(ConnectException ex){}
}
导致此异常:
Exception in thread "main" java.net.ConnectException: Connection timed out: connect
at java.net.DualStackPlainSocketImpl.connect0(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:79)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at java.net.Socket.connect(Socket.java:538)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1168)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1104)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:998)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:932)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1512)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440)
at java.net.URL.openStream(URL.java:1038)
at simplecrawler.SimpleCrawler.main(SimpleCrawler.java:61)
编辑使用try-catch,现在它在执行的其他部分陷入无限循环。
编辑2
通过在logger
之前添加if(UrlMatcher.find())
,在while循环中,当它进入无限循环时,它会显示以下日志(为了进一步清晰,我在日志之前包含了最后一个匹配)
rum-static.pingdom.net/prum.min.js //the last match
SEVERE: var flashvars = {};
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: flashvars.enableAPI = "true";
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: flashvars.galleryURL = "/svgallerysource.asp?galleryid=685";
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: var params = {};
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: params.bgcolor = "222222";
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: params.allowfullscreen = false;
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: params.allowscriptaccess = "always";
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: params.wmode = "transparent";
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: var attributes = {};
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: attributes.id = "svInstance";
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: attributes.name = "svInstance";
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: simpleviewer.ready(function () {
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: simpleviewer.load('flashContent', '920', '420', '222222', true, flashvars, params, attributes, true);
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: });
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: </script>
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE:
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: <link href="http://cdn-images.mailchimp.com/embedcode/slim-081711.css" rel="stylesheet" type="text/css">
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: <style type="text/css">
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: #mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; }
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: </style>
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE:
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE:
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE:
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js"></script>
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: <script type="text/javascript" src="/jplayer/jquery.jplayer.min.js"></script>
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: <script type="text/javascript" src="/jplayer/jquery.jplayer.inspector.js"></script>
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: <link rel="stylesheet" href="/css/colorbox.css" />
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE:
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE:
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: <script>
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: var _prum = [['id', '5397955dabe53dbb3ea78d70'],
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: ['mark', 'firstbyte', (new Date()).getTime()]];
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: (function() {
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: var s = document.getElementsByTagName('script')[0]
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: , p = document.createElement('script');
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: p.async = 'async';
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: p.src = '//rum-static.pingdom.net/prum.min.js';
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: s.parentNode.insertBefore(p, s);
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: })();
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: </script>
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE:
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE:
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE:
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE:
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE:
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE:
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE:
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE:
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: <style>
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE:
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: body
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: {
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: background-color: #ffffff;
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: }
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE:
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: div#bodycontainer-home
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: {
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: background-color:
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: #ffffff;
Nov 27, 2015 6:53:27 PM simplecrawler.SimpleCrawler openConnection
SEVERE: background-image:url(/images/uploaded/540973958472458.png);
答案 0 :(得分:1)
您应该使用setConnectTimeout
,然后抓住 SocketTimeoutException
。
try {
HttpURLConnection con = (HttpURLConnection) new URL(url).openConnection();
con.setConnectTimeout(5000); //set timeout to 5 seconds
return (con.getResponseCode() == HttpURLConnection.HTTP_OK);
}
catch (java.net.SocketTimeoutException e) { return false; }
请在此处查看documentation。