如果任何一台服务器出现故障,为什么我会看到很多TimeoutException?

时间:2014-08-30 05:44:39

标签: java multithreading thread-safety atomicity futuretask

这是我的DataClientFactory类。

public class DataClientFactory {
    public static IClient getInstance() {
        return ClientHolder.INSTANCE;
    }

    private static class ClientHolder {
        private static final DataClient INSTANCE = new DataClient();
        static {
            new DataScheduler().startScheduleTask();
        }
    }
}

这是我的DataClient类。

public class DataClient implements IClient {

    private ExecutorService service = Executors.newFixedThreadPool(15);
    private RestTemplate restTemplate = new RestTemplate();

    // for initialization purpose
    public DataClient() {
        try {
            new DataScheduler().callDataService();
        } catch (Exception ex) { // swallow the exception
            // log exception
        }
    }

    @Override
    public DataResponse getDataSync(DataKey dataKeys) {
        DataResponse response = null;
        try {
            Future<DataResponse> handle = getDataAsync(dataKeys);
            response = handle.get(dataKeys.getTimeout(), TimeUnit.MILLISECONDS);
        } catch (TimeoutException e) {
            // log error
            response = new DataResponse(null, DataErrorEnum.CLIENT_TIMEOUT, DataStatusEnum.ERROR);
        } catch (Exception e) {
            // log error
            response = new DataResponse(null, DataErrorEnum.ERROR_CLIENT, DataStatusEnum.ERROR);
        }

        return response;
    }

    @Override
    public Future<DataResponse> getDataAsync(DataKey dataKeys) {
        Future<DataResponse> future = null;
        try {
            DataTask dataTask = new DataTask(dataKeys, restTemplate);
            future = service.submit(dataTask);
        } catch (Exception ex) {
            // log error
        }

        return future;
    }
}

我从上面的工厂获取客户端实例,如下所示,然后通过传递getDataSync对象调用DataKey方法。 DataKey对象中包含userIdTimeout个值。在此之后,只要调用DataTask,就会调用call类到handle.get方法。

IClient dataClient = DataClientFactory.getInstance();

long userid = 1234l;
long timeout_ms = 500;

DataKey keys = new DataKey.Builder().setUserId(userid).setTimeout(timeout_ms)
            .remoteFlag(false).secondaryFlag(true).build();

// call getDataSync method
DataResponse dataResponse = dataClient.getDataSync(keys);
System.out.println(dataResponse);

这是我的DataTask类,它具有所有逻辑 -

public class DataTask implements Callable<DataResponse> {

    private DataKey dataKeys;
    private RestTemplate restTemplate;

    public DataTask(DataKey dataKeys, RestTemplate restTemplate) {
        this.restTemplate = restTemplate;
        this.dataKeys = dataKeys;
    }

    @Override
    public DataResponse call() {

        DataResponse dataResponse = null;
        ResponseEntity<String> response = null;

        int serialId = getSerialIdFromUserId();

        boolean remoteFlag = dataKeys.isRemoteFlag();
        boolean secondaryFlag = dataKeys.isSecondaryFlag();

        List<String> hostnames = new LinkedList<String>();

        Mappings mappings = ClientData.getMappings(dataKeys.whichFlow());

        String localPrimaryAdress = null;
        String remotePrimaryAdress = null;
        String localSecondaryAdress = null;
        String remoteSecondaryAdress = null;

        // use mappings object to get above Address by using serialId and basis on 
        // remoteFlag and secondaryFlag populate the hostnames linked list

        if (remoteFlag && secondaryFlag) {
            hostnames.add(localPrimaryHostIPAdress);
            hostnames.add(localSecondaryHostIPAdress);
            hostnames.add(remotePrimaryHostIPAdress);
            hostnames.add(remoteSecondaryHostIPAdress);
        } else if (remoteFlag && !secondaryFlag) {
            hostnames.add(localPrimaryHostIPAdress);
            hostnames.add(remotePrimaryHostIPAdress);
        } else if (!remoteFlag && !secondaryFlag) {
            hostnames.add(localPrimaryHostIPAdress);
        } else if (!remoteFlag && secondaryFlag) {
            hostnames.add(localPrimaryHostIPAdress);
            hostnames.add(localSecondaryHostIPAdress);
        }

        for (String hostname : hostnames) {
            // If host name is null or host name is in local block host list, skip sending request to this host
            if (hostname == null || ClientData.isHostBlocked(hostname)) {
                continue;
            }

            try {
                String url = generateURL(hostname);
                response = restTemplate.exchange(url, HttpMethod.GET, dataKeys.getEntity(), String.class);

                // make DataResponse

                break;

            } catch (HttpClientErrorException ex) {
                // make DataResponse
                return dataResponse;
            } catch (HttpServerErrorException ex) {
                // make DataResponse
                return dataResponse;
            } catch (RestClientException ex) {
                // If it comes here, then it means some of the servers are down.
                // Add this server to block host list 
                ClientData.blockHost(hostname);
                // log an error

            } catch (Exception ex) {
                // If it comes here, then it means some weird things has happened.
                // log an error
                // make DataResponse
            }
        }

        return dataResponse;
    }

    private String generateURL(final String hostIPAdress) {
        // make an url
    }


    private int getSerialIdFromUserId() {
        // get the id
    }
}

现在基于userId,我将获得serialId,然后获取主机名列表,我想根据传递的标志进行调用。然后我迭代hostnames列表并调用服务器。假设我在链表中​​有四个主机名(A,B,C,D),那么我将首先调用A,如果我收回数据,则返回DataResponse。但是假设如果A关闭,那么我需要立即将A添加到阻止列表,以便其他线程不能调用A主机名。然后调用主机名B并获取数据并返回响应(如果B也关闭,则重复相同的操作)。

我有一个后台线程,它每隔10分钟运行一次,一旦我们从工厂获得客户端实例,它就会启动它并解析我的另一个服务URL以获取我们不应该生成的块主机名列表一个电话。因为它每10分钟运行一次,所以任何服务器都会关闭,它只会在10分钟后得到列表。通常假设如果A关闭,那么我的服务将提供A作为主机名的阻止列表,并且一旦A变为up ,然后该列表也将在10分钟后更新。

这是我的后台线程代码DataScheduler -

public class DataScheduler {

    private RestTemplate restTemplate = new RestTemplate();
    private static final Gson gson = new Gson();

    private final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);

    public void startScheduleTask() {
        scheduler.scheduleAtFixedRate(new Runnable() {
            public void run() {
                try {
                    callDataService();
                } catch (Exception ex) {
                    // log an error
                }
            }
        }, 0, 10L, TimeUnit.MINUTES);
    }

    public void callDataService() throws Exception {
        String url = null;

        // execute the url and get the responseMap from it as a string

        parseResponse(responseMap);
    }


    private void parseResponse(Map<FlowsEnum, String> responses) throws Exception {

        // .. some code here to calculate partitionMappings

        // block list of hostnames 
        Map<String, List<String>> coloExceptionList = gson.fromJson(response.split("blocklist=")[1], Map.class);
        for (Map.Entry<String, List<String>> entry : coloExceptionList.entrySet()) {
            for (String hosts : entry.getValue()) {
                blockList.add(hosts);
            }
        }

        if (update) {
            ClientData.setAllMappings(partitionMappings);
        }

        // update the block list of hostnames
        if (!DataUtils.isEmpty(responses)) {
            ClientData.replaceBlockedHosts(blockList);
        }
    }
}

这是我的ClientData类,它包含主机名和partitionMappings详细信息的阻止列表的所有信息(用于获取有效主机名列表)。

public class ClientData {

    private static final AtomicReference<ConcurrentHashMap<String, String>> blockedHosts = new AtomicReference<ConcurrentHashMap<String, String>>(
            new ConcurrentHashMap<String, String>());


    // some code here to set the partitionMappings by using CountDownLatch 
    // so that read is blocked for first time reads

    public static boolean isHostBlocked(String hostName) {
        return blockedHosts.get().contains(hostName);
    }

    public static void blockHost(String hostName) {
        blockedHosts.get().put(hostName, hostName);
    }

    public static void replaceBlockedHosts(List<String> blockList) {
        ConcurrentHashMap<String, String> newBlockedHosts = new ConcurrentHashMap<>();
        for (String hostName : blockList) {
            newBlockedHosts.put(hostName, hostName);
        }
        blockedHosts.set(newBlockedHosts);
    }
}

问题陈述: -

当所有服务器都启动时(A,B,C,D为例)上面的代码工作正常,我看不到TimeoutException发生任何handle.get但是如果让我们假设一个服务器(A)发生故障,我本来应该从主线程拨打电话,然后我开始看到很多TimeoutException,我的意思是,发生了大量的客户端超时。

我不确定为什么会这样?一般来说,这不会发生,因为一旦服务器关闭,它将被添加到blockList,然后没有线程将调用该服务器,而是它将尝试列表中的另一个服务器?所以它应该是平滑的过程,然后一旦这些服务器启动,blockList将从后台线程更新,然后你就可以开始打电话了。

我的上述代码中是否有任何问题导致此问题?任何建议都会有很大的帮助。

通常,我要做的是 - 根据使用映射对象传递的用户ID来创建主机名列表。然后调用第一个主机名并获取响应。但是如果该主机名已关闭,则添加到阻止列表并调用列表中的第二个主机名。

这是我看到的Stacktrace -

java.util.concurrent.TimeoutException\n\tat java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:258)
java.util.concurrent.FutureTask.get(FutureTask.java:119)\n\tat com.host.client.DataClient.getDataSync(DataClient.java:20)\n\tat 

注意:对于多个userId,我们可以拥有相同的服务器,这意味着服务器A可以解析为多个userId。

2 个答案:

答案 0 :(得分:0)

在DataClient类中,位于以下行:

public class DataClient implements IClient {

----code code---

        Future<DataResponse> handle = getDataAsync(dataKeys);

//BELOW LINE IS PROBLEM

        response = handle.get(dataKeys.getTimeout(), TimeUnit.MILLISECONDS); // <--- HERE
    } catch (TimeoutException e) {
        // log error
        response = new DataResponse(null, DataErrorEnum.CLIENT_TIMEOUT, DataStatusEnum.ERROR);
    } catch (Exception e) {
        // log error
        response = new DataResponse(null, DataErrorEnum.ERROR_CLIENT, DataStatusEnum.ERROR);

----code code-----

你已经为handle.get(...)分配了一个超时,它在你的REST连接响应之前超时。其余的连接本身可能会或可能不会超时,但由于您在完成线程执行之前计时未来的get方法,因此阻塞主机没有明显的效果,而调用方法中的代码是DataTask可能正在按预期执行。希望这会有所帮助。

答案 1 :(得分:0)

你问了一些建议,所以这里有一些建议:

1。)意外的返回值
方法意外返回 FALSE

if (ClientData.isHostBlocked(hostname)) //this may return always false! please check

2。)异常处理
    您非常确定,是否发生 RestClientException ?     只有发生此异常时,主机才会被添加到阻止列表中!     您发布的代码似乎忽略日志记录(已注释掉!)

        ...catch (HttpClientErrorException ex) {
            // make DataResponse
            return dataResponse;
        } catch (HttpServerErrorException ex) {
            // make DataResponse
            return dataResponse;
        } catch (RestClientException ex) {
            // If it comes here, then it means some of the servers are down.
            // Add this server to block host list 
            ClientData.blockHost(hostname);
            // log an error

        } catch (Exception ex) {
            // If it comes here, then it means some weird things has happened.
            // log an error
            // make DataResponse
        }