我有一个java代码来废弃页面内容。一次我执行2500个线程,每个线程有100个网址要废弃。所有线程都成功执行,但几个线程永远挂起而不会抛出任何异常。使用ubuntu作为生产服务器。 代码卡在下面一行:
InputStream in = urlConnection.getInputStream();
我已经连接并读取超时,这是有效的。只需很少的线程,即使读取超时也无法正常工作,它永远都会挂起。 我尝试过多次失败并尝试失败。
我甚至使用thread.stop()杀死了被绞死的线程(不推荐的方法)但挂起的线程tcp连接在linux服务器上仍然存在。
java 7325 root 2675u IPv4 284078467 0t0 TCP scrapper-new-instance-2.c.quantum-tracker-93805.internal:37068->104.131.210.5:22225 (ESTABLISHED)
java 7325 root 2688u IPv4 284077787 0t0 TCP scrapper-new-instance-2.c.quantum-tracker-93805.internal:38132->104.131.210.5:22225 (ESTABLISHED)
java 7325 root 2723u IPv4 284057771 0t0 TCP scrapper-new-instance-2.c.quantum-tracker-93805.internal:43661->104.131.210.5:22225 (ESTABLISHED)
任何人都有想法,我如何调试和解决这个问题?
以下是代码:
int counter = 0;
int maxAttempts = (config.getProperty("maxAttempts") != null ? Integer.parseInt(config
.getProperty("maxAttempts")) : 100);
Proxy proxy = null;
while (counter < maxAttempts) {
try {
Type proxyType = Proxy.Type.HTTP;
String proxyIP = "";
int proxyPort;
int proxyIndex = getRandomNumber(1, httpProxies.size());
if(httpProxies.get(proxyIndex).split(":").length == 4){
proxyIP = httpProxies.get(proxyIndex).split(":")[0];
proxyPort = Integer.parseInt(httpProxies.get(proxyIndex).split(":")[1]);
if (httpProxies.get(proxyIndex).split(":").length == 3) {
if (httpProxies.get(proxyIndex).split(":")[2].toLowerCase().contains("socks"))
proxyType = Proxy.Type.SOCKS;
}
}else{
counter = counter - 1;
throw new Exception("Escapeing for IP --- "+httpProxies.get(proxyIndex));
}
URL url = new URL(urlSring);
InetSocketAddress inetSocketAddress = new InetSocketAddress(proxyIP, proxyPort);
proxy = new Proxy(proxyType,inetSocketAddress);
int userAgentIndex = getRandomNumber(1, userAgents.size());
logger.info("Attempt = " + counter + " using proxy " + httpProxies.get(proxyIndex) + " (" + proxyType.name()
+ ") for url " + urlSring);
HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection(proxy);
if (config.getProperty("connectionTimeoutInMilliSecs") != null)
urlConnection
.setConnectTimeout(Integer.parseInt(config.getProperty("connectionTimeoutInMilliSecs")));
else
urlConnection.setConnectTimeout(CONNECTION_TIMEOUT_VALUE);
if (config.getProperty("readTimeoutInMilliSecs") != null)
urlConnection.setReadTimeout(Integer.parseInt(config.getProperty("readTimeoutInMilliSecs")));
else
urlConnection.setReadTimeout(READ_TIMEOUT_VALUE);
System.setProperty("http.agent", "");
urlConnection.setRequestProperty("User-Agent", "");
urlConnection.setRequestProperty("User-Agent", userAgents.get(userAgentIndex));
urlConnection.addRequestProperty("Accept-Encoding", "gzip, deflate, br"); // to avoid server returned http response code 403
urlConnection.setInstanceFollowRedirects(true);
//Few Thread hang here for ever
InputStream in = urlConnection.getInputStream();
if(null != urlConnection.getContentEncoding() && urlConnection.getContentEncoding().equals("gzip")){
in = new GZIPInputStream(in);
}
String output = IOUtils.toString(in, Charset.forName("UTF-8").name());
logger.info("Proxy Address:-"+proxy.address()+ " HTTP Response Code : " + urlConnection.getResponseCode() + " HTTP Response Message : "
+ urlConnection.getResponseMessage() + " for url ---" + urlSring);
logger.info("Success scraping for url --- "+urlSring+ " --- using proxy --- "+httpProxies.get(proxyIndex));
// Close Input Stream
if(in != null){
in.close();
}
// Close url connection and release underlying socket if exists.
if(urlConnection != null){
urlConnection.disconnect();
}
url = null;
urlConnection = null;
return output;
} catch (Exception e) {
logger.info(e);
counter++;
/*
* logger.info("Exception : " + e.getMessage() + " while using proxy " + proxy.address() +
* ".Trying next proxy.");
*/
if (config.getProperty("shouldSleepBetweenRequests") != null
&& config.getProperty("shouldSleepBetweenRequests").equalsIgnoreCase("true")) {
Random r = new Random();
int low = config.getProperty("minSleepTime") != null ? Integer.parseInt(config
.getProperty("minSleepTime")) : 0;
int high = config.getProperty("maxSleepTime") != null ? Integer.parseInt(config
.getProperty("maxSleepTime")) : 5;
int timeToSleep = r.nextInt(high - low) + low;
logger.info("Sleeping for " + timeToSleep + " seconds ... ");
try {
Thread.sleep(timeToSleep * 1000);
} catch (InterruptedException e1) {
e1.printStackTrace();
}
}
}
}
if (counter >= maxAttempts)
logger.info("Stoping after " + maxAttempts + " attempts ...for url "+ urlSring);
return "";
请分享您的想法,让我知道如何解决问题。 我不想杀死被绞死的线程,而是我希望在可能的情况下为该场景实现一些超时。
答案 0 :(得分:1)
尝试使用更复杂的HTTP客户端,即使用Jetty,您可以设置套接字连接的超时时间:
HttpClient httpClient = new HttpClient();
httpClient.start();
//socket connection timeout in ms
httpClient.setConnectTimeout(500)
// One liner:
httpClient.GET("http://localhost:8080/").getStatus();
// Building a request with a timeout for request/response conversation
ContentResponse response = httpClient.newRequest("http://localhost:8080")
.timeout(5, TimeUnit.SECONDS)
.send();