多线程中的Java线程永远挂起,不会抛出任何异常

时间:2016-08-05 13:54:22

标签: java linux multithreading http tcp

我有一个java代码来废弃页面内容。一次我执行2500个线程,每个线程有100个网址要废弃。所有线程都成功执行,但几个线程永远挂起而不会抛出任何异常。使用ubuntu作为生产服务器。 代码卡在下面一行:

InputStream in = urlConnection.getInputStream();

我已经连接并读取超时,这是有效的。只需很少的线程,即使读取超时也无法正常工作,它永远都会挂起。 我尝试过多次失败并尝试失败。

我甚至使用thread.stop()杀死了被绞死的线程(不推荐的方法)但挂起的线程tcp连接在linux服务器上仍然存在

java    7325 root 2675u  IPv4          284078467        0t0       TCP scrapper-new-instance-2.c.quantum-tracker-93805.internal:37068->104.131.210.5:22225 (ESTABLISHED)
java    7325 root 2688u  IPv4          284077787        0t0       TCP scrapper-new-instance-2.c.quantum-tracker-93805.internal:38132->104.131.210.5:22225 (ESTABLISHED)
java    7325 root 2723u  IPv4          284057771        0t0       TCP scrapper-new-instance-2.c.quantum-tracker-93805.internal:43661->104.131.210.5:22225 (ESTABLISHED)

任何人都有想法,我如何调试和解决这个问题?

以下是代码:

    int counter = 0;
    int maxAttempts = (config.getProperty("maxAttempts") != null ? Integer.parseInt(config
                .getProperty("maxAttempts")) : 100);
    Proxy proxy = null;
    while (counter < maxAttempts) {
        try {
            Type proxyType = Proxy.Type.HTTP;
            String proxyIP = "";
            int proxyPort;

            int proxyIndex = getRandomNumber(1, httpProxies.size());

            if(httpProxies.get(proxyIndex).split(":").length == 4){
                proxyIP = httpProxies.get(proxyIndex).split(":")[0];
                proxyPort = Integer.parseInt(httpProxies.get(proxyIndex).split(":")[1]);

                if (httpProxies.get(proxyIndex).split(":").length == 3) {
                    if (httpProxies.get(proxyIndex).split(":")[2].toLowerCase().contains("socks"))
                        proxyType = Proxy.Type.SOCKS;
                }
            }else{
                counter = counter - 1;
                throw new Exception("Escapeing for IP --- "+httpProxies.get(proxyIndex));

            }

            URL url = new URL(urlSring);
            InetSocketAddress inetSocketAddress = new InetSocketAddress(proxyIP, proxyPort);
            proxy = new Proxy(proxyType,inetSocketAddress);

            int userAgentIndex = getRandomNumber(1, userAgents.size());

            logger.info("Attempt = " + counter + " using proxy " + httpProxies.get(proxyIndex) + " (" + proxyType.name()
                        + ") for url " + urlSring);

            HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection(proxy);

            if (config.getProperty("connectionTimeoutInMilliSecs") != null)
                urlConnection
                            .setConnectTimeout(Integer.parseInt(config.getProperty("connectionTimeoutInMilliSecs")));
            else
                urlConnection.setConnectTimeout(CONNECTION_TIMEOUT_VALUE);

            if (config.getProperty("readTimeoutInMilliSecs") != null)
                urlConnection.setReadTimeout(Integer.parseInt(config.getProperty("readTimeoutInMilliSecs")));
            else
                urlConnection.setReadTimeout(READ_TIMEOUT_VALUE);


            System.setProperty("http.agent", "");

            urlConnection.setRequestProperty("User-Agent", "");
            urlConnection.setRequestProperty("User-Agent", userAgents.get(userAgentIndex));
            urlConnection.addRequestProperty("Accept-Encoding", "gzip, deflate, br"); // to avoid server returned http response code 403
            urlConnection.setInstanceFollowRedirects(true);

            //Few  Thread hang here for ever
            InputStream in = urlConnection.getInputStream();

            if(null !=  urlConnection.getContentEncoding() && urlConnection.getContentEncoding().equals("gzip")){
                in = new GZIPInputStream(in);
            }

            String output = IOUtils.toString(in, Charset.forName("UTF-8").name());

            logger.info("Proxy Address:-"+proxy.address()+ " HTTP Response Code : " + urlConnection.getResponseCode() + " HTTP Response Message : "
                        + urlConnection.getResponseMessage() + " for url ---" + urlSring);



            logger.info("Success scraping for url --- "+urlSring+ " --- using proxy --- "+httpProxies.get(proxyIndex));
            // Close Input Stream
            if(in != null){
                in.close();
            }

            // Close url connection and release underlying socket if exists.
            if(urlConnection != null){
                urlConnection.disconnect();
            }

            url = null;
            urlConnection = null;
            return output;

        } catch (Exception e) {
            logger.info(e);
            counter++;
            /*
             * logger.info("Exception : " + e.getMessage() + " while using proxy " + proxy.address() +
             * ".Trying next proxy.");
             */

            if (config.getProperty("shouldSleepBetweenRequests") != null
                        && config.getProperty("shouldSleepBetweenRequests").equalsIgnoreCase("true")) {
                Random r = new Random();
                int low = config.getProperty("minSleepTime") != null ? Integer.parseInt(config
                            .getProperty("minSleepTime")) : 0;
                int high = config.getProperty("maxSleepTime") != null ? Integer.parseInt(config
                            .getProperty("maxSleepTime")) : 5;
                int timeToSleep = r.nextInt(high - low) + low;
                logger.info("Sleeping for " + timeToSleep + " seconds ... ");
                try {
                    Thread.sleep(timeToSleep * 1000);
                } catch (InterruptedException e1) {
                    e1.printStackTrace();
                }
            }
        }
    }

    if (counter >= maxAttempts)
        logger.info("Stoping after " + maxAttempts + " attempts ...for url "+ urlSring);

    return "";

请分享您的想法,让我知道如何解决问题。 我不想杀死被绞死的线程,而是我希望在可能的情况下为该场景实现一些超时。

1 个答案:

答案 0 :(得分:1)

尝试使用更复杂的HTTP客户端,即使用Jetty,您可以设置套接字连接的超时时间:

HttpClient httpClient = new HttpClient();
 httpClient.start();

 //socket connection timeout in ms
 httpClient.setConnectTimeout(500)

 // One liner:
 httpClient.GET("http://localhost:8080/").getStatus();

 // Building a request with a timeout for request/response conversation
 ContentResponse response = httpClient.newRequest("http://localhost:8080")
         .timeout(5, TimeUnit.SECONDS)
         .send();