从API读取数据

时间:2013-08-16 01:51:26

标签: java json optimization csv

我编写了一个从外部API读取一些数据的函数。我的功能是什么,它从磁盘读取文件时调用该API。我想为大文件(35000条记录)优化我的代码。你能否就此提出建议?

以下是我的代码。

public void readCSVFile() {

    try {

        br = new BufferedReader(new FileReader(getFileName()));

        while ((line = br.readLine()) != null) {


            String[] splitLine = line.split(cvsSplitBy);

            String campaign = splitLine[0];
            String adGroup =  splitLine[1];
            String url = splitLine[2];              
            long searchCount = getSearchCount(url);             

            StringBuilder sb = new StringBuilder();
            sb.append(campaign + ",");
            sb.append(adGroup + ",");               
            sb.append(searchCount + ",");               
            writeToFile(sb, getNewFileName());

        }

    } catch (Exception e) {
        e.printStackTrace();
    }
}

private long getSearchCount(String url) {
    long recordCount = 0;
    try {

        DefaultHttpClient httpClient = new DefaultHttpClient();

        HttpGet getRequest = new HttpGet(
                "api.com/querysearch?q="
                        + url);
        getRequest.addHeader("accept", "application/json");

        HttpResponse response = httpClient.execute(getRequest);

        if (response.getStatusLine().getStatusCode() != 200) {
            throw new RuntimeException("Failed : HTTP error code : "
                    + response.getStatusLine().getStatusCode());
        }

        BufferedReader br = new BufferedReader(new InputStreamReader(
                (response.getEntity().getContent())));

        String output;

        while ((output = br.readLine()) != null) {
            try {

                JSONObject json = (JSONObject) new JSONParser()
                        .parse(output);
                JSONObject result = (JSONObject) json.get("result");
                recordCount = (long) result.get("count");
                System.out.println(url + "=" + recordCount);

            } catch (Exception e) {
                System.out.println(e.getMessage());
            }

        }

        httpClient.getConnectionManager().shutdown();

    } catch (Exception e) {
        e.getStackTrace();
    }
    return recordCount;

}

1 个答案:

答案 0 :(得分:1)

由于远程调用比本地磁盘访问慢,因此您需要以某种方式并行化或批量远程调用。如果您无法对远程API进行批量调用,但它允许多个并发读取,那么您可能希望使用类似线程池的内容来进行远程调用:

public void readCSVFile() {
    // exception handling ignored for space
    br = new BufferedReader(new FileReader(getFileName()));
    List<Future<String>> futures = new ArrayList<Future<String>>();
    ExecutorService pool = Executors.newFixedThreadPool(5);

    while ((line = br.readLine()) != null) {
        final String[] splitLine = line.split(cvsSplitBy);
        futures.add(pool.submit(new Callable<String> {
            public String call() {
                long searchCount = getSearchCount(splitLine[2]);
                return new StringBuilder()
                    .append(splitLine[0]+ ",")
                    .append(splitLine[1]+ ",")
                    .append(searchCount + ",")
                    .toString();
            }
        }));
    }

    for (Future<String> fs: futures) {
        writeToFile(fs.get(), getNewFileName());
    }

    pool.shutdown();
}

但理想情况下,如果可能的话,您真的希望从远程API中进行单批读取。