Question

我注意到在阅读大文件时使用java.util.Scanner非常慢（在我的情况下，是CSV文件）。

我想改变我目前正在阅读文件的方式，以提高性能。以下是我目前的情况。请注意，我正在为Android开发：

InputStreamReader inputStreamReader;
    try {
        inputStreamReader = new InputStreamReader(context.getAssets().open("MyFile.csv"));
        Scanner inputStream = new Scanner(inputStreamReader);
        inputStream.nextLine(); // Ignores the first line
        while (inputStream.hasNext()) {
            String data = inputStream.nextLine(); // Gets a whole line
            String[] line = data.split(","); // Splits the line up into a string array

            if (line.length > 1) {
                // Do stuff, e.g:
                String value = line[1];
            }
        }
        inputStream.close();
    } catch (IOException e) {
        e.printStackTrace();
    }

使用Traceview，我设法发现主要的性能问题，特别是：java.util.Scanner.nextLine()和java.util.Scanner.hasNext()。

我已经查看过其他问题（例如this one），而且我遇到了一些CSV读者，例如Apache Commons CSV，但他们看起来并不像有关如何使用它们的大量信息，我不确定它们会有多快。

我也听说过在this one这样的答案中使用FileReader和BufferedReader，但我不知道改进是否会很重要。

我的文件长度约为30,000行，并且使用我目前的代码（上图），从大约600行读取值至少需要1分钟，所以我没有计划需要多长时间读取超过2,000行的值，但有时，在阅读信息时，Android应用程序会无响应并崩溃。

虽然我可以简单地更改部分代码并亲自查看，但我想知道是否有任何更快的替代方案我没有提及，或者我是否应该使用{{1} }和FileReader。将大文件拆分成较小的文件会更快，并根据我想要检索的信息选择要读取的文件吗？最好，我还想知道为什么最快的方法是最快的（即使它快速的原因）。

Answer 1

uniVocity-parsers拥有您能找到的最快的CSV解析器（比OpenCSV快2倍，比Apache Commons CSV快3倍），具有许多独特的功能。

以下是一个如何使用它的简单示例：

CsvParserSettings settings = new CsvParserSettings(); // many options here, have a look at the tutorial

CsvParser parser = new CsvParser(settings);

// parses all rows in one go
List<String[]> allRows = parser.parseAll(new FileReader(new File("your/file.csv")));

为了加快处理速度，您可以选择您感兴趣的列：

parserSettings.selectFields("Column X", "Column A", "Column Y");

通常，您应该能够在2秒左右解析400万行。通过列选择，速度将提高约30％。

如果使用RowProcessor，速度会更快。有许多实现开箱即用于处理对象，POJOS等的转换。文档解释了所有可用的功能。它的工作原理如下：

// let's get the values of all columns using a column processor
ColumnProcessor rowProcessor = new ColumnProcessor();
parserSettings.setRowProcessor(rowProcessor);

//the parse() method will submit all rows to the row processor
parser.parse(new FileReader(new File("/examples/example.csv")));

//get the result from your row processor:
Map<String, List<String>> columnValues = rowProcessor.getColumnValuesAsMapOfNames();

我们还构建了一个简单的速度比较项目here。

Answer 2

您的代码可以加载大文件。但是，当一个操作比你期望的更长时，最好在任务而不是UI线程中执行它，以防止任何缺乏响应。

AsyncTask类有助于实现这一目标：

private class LoadFilesTask extends AsyncTask<String, Integer, Long> {
    protected Long doInBackground(String... str) {
        long lineNumber = 0;
        InputStreamReader inputStreamReader;
        try {
            inputStreamReader = new
                    InputStreamReader(context.getAssets().open(str[0]));
            Scanner inputStream = new Scanner(inputStreamReader);
            inputStream.nextLine(); // Ignores the first line

            while (inputStream.hasNext()) {
                lineNumber++;
                String data = inputStream.nextLine(); // Gets a whole line
                String[] line = data.split(","); // Splits the line up into a string array

                if (line.length > 1) {
                    // Do stuff, e.g:
                    String value = line[1];
                }
            }
            inputStream.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return lineNumber;
    }

    //If you need to show the progress use this method
    protected void onProgressUpdate(Integer... progress) {
        setYourCustomProgressPercent(progress[0]);
    }

    //This method is triggered at the end of the process, in your case when the loading has finished
    protected void onPostExecute(Long result) {
        showDialog("File Loaded: " + result + " lines");
    }
}

...并执行为：

new LoadFilesTask().execute("MyFile.csv");

Answer 3

您应该使用BufferedReader：

BufferedReader reader = null;
try {
    reader = new BufferedReader( new InputStreamReader(context.getAssets().open("MyFile.csv"))) ;
    reader.readLine(); // Ignores the first line
    String data;
    while ((data = reader.readLine()) != null) { // Gets a whole line
        String[] line = data.split(","); // Splits the line up into a string array
        if (line.length > 1) {
            // Do stuff, e.g:
            String value = line[1];
        }
    }
} catch (IOException e) {
    e.printStackTrace();
} finally {
    if (reader != null) {
        try {
            reader.close();
        } catch (IOException e) {
            e.printStackTrace();
        } 
    } 
}

用Java阅读CSV文件的最快方法

3 个答案: