Question

我正在尝试解析以下文件，其中包含以下格式的信息：

TABLE_NAME

VARIABLE_LIST_OF_COLUMNS

VARIABLE_NUMBER_OF_ROWS（由标签分隔符分隔）

一个例子（使用'，'作为问题的分隔符;实际的分隔符是一个标签）：

学生

ID

NAME

1，麦克

2，金伯利

这个想法是构建一个insert sql语句列表（代码片段的上下文）。

我想知道的是，使用java 8流API是否可以实现这种多线解析？这就是我现在所拥有的：

public final class StatementGeneratorMain {

    public static void main(final String[] args) throws Exception{
        List<String> fileNames = Arrays
            .asList("STUDENTS.txt");
        fileNames.stream()
            .forEach(fileName -> {
                String tableName;
                List<String> columnNames;
                List<String[]>  dataRows;
                try (BufferedReader br = getBufferedReader(fileName)) {
                    tableName = br.lines().findFirst().get();
                } catch (Exception e) {
                    throw new RuntimeException(e);
                }

                try (BufferedReader br = getBufferedReader(fileName)) {
                    //skip the first line because its been processed.
                    columnNames = br.lines().skip(1).filter(v -> v.split("\t").length == 1).collect(toList());
                } catch (Exception e) {
                    throw new RuntimeException(e);
                }

                try (BufferedReader br = getBufferedReader(fileName)) {
                    //skip the first line and the columns length to get the data
                    //columns are identified as being splittable on the delimiter
                    dataRows = br.lines().skip(1 + columnNames.size()).map(s -> s.split("\t"))
                        .collect(toList());
                } catch (Exception e) {
                    throw new RuntimeException(e);
                }

                String columns = columnNames.stream().collect(joining(",","(",")"));

                List<String> dataRow = dataRows.stream()
                    .map(arr -> Arrays.stream(arr).map(x -> "'" + x + "'").collect(joining(",", "(", ")")))
                    .map(row -> String.format("INSERT INTO %s %s VALUES %s;", tableName, columns, row))
                    .collect(toList());

                dataRow.forEach(l -> System.out.println(l));
            });
    }

    private static BufferedReader getBufferedReader(String fileName) {
        return new BufferedReader(new InputStreamReader(StatementGeneratorMain.class.getClassLoader().getResourceAsStream(
            fileName)));
    }
}

这段代码完成了我的工作，但我并不喜欢它因为我读了三次相同的文件（一次用于表名，再次推断列，再次获取行）。我也不认为这是正确的功能风格。

我正在寻找的是使用流API进行此类多行/多记录解析的更优雅方式。

为完整起见，输出为：

插入学生（身份证，姓名）价值观（'1'，'迈克'）;

插入学生（身份证，姓名）价值观（'2'，'金佰利'）;

此时我不太关注数字列和空值等内容。

Answer 1

我不确定在这里使用流是否正确，因为它们用于迭代数据一次，或者更准确地说，以一种方式处理数据。如果需要以不同方式处理单独的数据块，则应该使用旧的循环或迭代器。想到的最简单的解决方案之一就是使用Scanner，因此您的代码可能如下所示：

Pattern oneWordLine = Pattern.compile("^\\w+$", Pattern.MULTILINE);

List<String> files = Arrays.asList("input.txt");
for (String file : files) {

    try (Scanner sc = new Scanner(new File(file))) {

        String tableName = sc.nextLine();

        StringJoiner columnNamesJoiner = new StringJoiner(", ", "(", ")");
        // iterate over lines with single words
        while (sc.hasNext(oneWordLine)) {
            columnNamesJoiner.add(sc.nextLine());
        }
        String columns = columnNamesJoiner.toString();


        List<String> dataRow = new ArrayList<>();
        // iterate over rest of lines
        while (sc.hasNextLine()) {
            String values = Arrays.stream(sc.nextLine().split("\t")) 
                    .collect(joining("', '", "('", "')"));
            dataRow.add(String.format("INSERT INTO %s %s VALUES %s;", 
                    tableName,columns, values));
        }

        dataRow.forEach(System.out::println);

    } catch (Exception e) {
        e.printStackTrace();// no need to rethrow RuntimeEception
    }
}

Answer 2

你可以移动这篇文章＆＃34; BufferedReader br = getBufferedReader（fileName）＆＃34;到上面，并按照您的要求阅读。我不认为，需要阅读三次。

使用java 8流解析多行记录

2 个答案: