Question

我正在使用Java 8 Stream类来读取大约500Mb的.csv文件，除了我找到的2个实例外，几乎所有数据的格式都相同。我存储在ArrayList中的每个对象有52行，然后将它们添加到HashMap，以便我可以根据键访问它们。我使用HashMap使用不同的类为每个对象创建一个excel文件，然后在创建文件后立即清除List并转到另一个对象。问题是，当涉及到数量较少的行时，excel创建类会尝试从不存在的索引中获取数字，这会抛出NullPointerException。如果抛出NullPointerException，有没有办法跳过这些行？我知道如果出现这个问题，我必须跳过52行。

try
    {
        final String regex = "\\d*\\.?\\d+";
        Stream<String> lines = Files.lines( file, StandardCharsets.UTF_8 );
        for( String line : (Iterable<String>) lines.skip(currentLine)::iterator ){
            final Pattern pattern = Pattern.compile(regex);
            final Matcher matcher = pattern.matcher(line.substring(0));
            while (matcher.find()) {
                testPop.add(Double.parseDouble(matcher.group(0)));
            }               
            currentLine++;
            if(currentLine%52==0) {
                for(int i =0;i<52;i++) {
                    int date=4+29*i;
                    int a=13+29*i;
                    int b=6+29*i;
                    int c=15+29*i;
                    int d=16+29*i;
                    int e=8+29*i;
                    int f=17+29*i;
                    int g=14+29*i;
                    int h=7+29*i;
                    WeeklyCalculations.put(Integer.parseInt(String.valueOf((int)((testPop.get(date))/1))),new Calculations(testPop.get(a),3,1,testPop.get(b),testPop.get(c),testPop.get(d),testPop.get(e),testPop.get(f),testPop.get(g),testPop.get(h),testPop.get(date),WeeklyCalculations));
                }
                findZeroStockOuts();
                ExcelCreator x = new ExcelCreator(WeeklyCalculations,String.valueOf(((int)(testPop.get(1)/1))),String.valueOf(((int)(testPop.get(2)/1))), noStockouts, stockOuts);
                x.createExcel();
                testPop.clear();
                WeeklyCalculations.clear();
                counter++;
                System.out.println(counter + "/" + "67101 - "+TimeUnit.SECONDS.convert(System.nanoTime(), TimeUnit.NANOSECONDS));

            }
        }

    } catch (IOException ioe){
        ioe.printStackTrace();
    }
    catch(NullPointerException x) {
        readToExcel(currentLine+52);
    }

我能够在循环中跳过它们，但是这会大大降低速度，考虑到它的大约350万行，并且它必须在每次迭代后跳过所有这些行。有没有一种有效的方法呢？

Answer 1

之所以慢是因为你反复从头开始读取文件。在下面的块中填写您的代码：

final String regex = "\\d*\\.?\\d+";
final Pattern pattern = Pattern.compile(regex);

try (Stream<String> lines = Files.lines(file, StandardCharsets.UTF_8)) {
    final Iterator<String> iter = lines.iterator();

    for (int currentLine = 1; iter.hasNext(); currentLine++) {
        String line = iter.next();
        final Matcher matcher = pattern.matcher(line); // No reason do: line.substring(0)
        while (matcher.find()) {
            // testPop.add(Double.parseDouble(matcher.group(0)));
        }

        try {
            if (currentLine % 52 == 0) {
                for (int i = 0; i < 52; i++) {
                    // TODO
                }
            }

            // TODO:
        } catch (IOException ioe) {
            ioe.printStackTrace();
            while (currentLine % 52 != 0 && iter.hasNext()) {
                iter.next();
                currentLine++;
            }
        } catch (NullPointerException x) {
            // readToExcel(currentLine + 52);
            while (currentLine % 52 != 0 && iter.hasNext()) {
                iter.next();
                currentLine++;
            }
        }
    }
}

以下是Fork of StreamEx：

简化代码的方法

final String regex = "\\d*\\.?\\d+";
final Pattern pattern = Pattern.compile(regex);

try (StreamEx<String> stream = StreamEx.ofLines(file, StandardCharsets.UTF_8)) {
    stream.splitToList(52).filter(l -> l.size() == 52).forEach(lines -> {
        lines.stream().forEach(line -> {
            final Matcher matcher = pattern.matcher(line); // No reason do: line.substring(0)
            while (matcher.find()) {
                // testPop.add(Double.parseDouble(matcher.group(0)));
            }
        });

        try {
            // TODO:
        } catch (IOException ioe) {
            ioe.printStackTrace();
        } catch (NullPointerException x) {
            // readToExcel(currentLine + 52);
        }
    });
} catch (IOException e) {
    e.printStackTrace();
}

如果抛出异常Java Stream，则跳过csv中的行

1 个答案: