Java中csv文件的正则表达式

时间:2017-05-02 22:31:24

标签: java regex csv

我必须识别CSV文件中符合特定搜索条件的行。 CSV文件中的数据看起来像这样:

Wilbur Smith,Elephant Song,McMillain,1992,1
Wilbur Smith,Birds of Prey,McMillain,1992,1
George Orwell,Animal Farm,Secker & Warburg,1945,1
George Orwell,1984,Secker & Warburg,1949,1

搜索条件如下:

Orwell,,,,
,Elephant,,,

第一行标识2行,第2行标识第2行。我目前正在阅读如下文件,但没有使用上述标准。

br = new BufferedReader(new FileReader(csvFile));
while ((line = br.readLine()) != null) {
    String[] dataItems = line.split(cvsSplitBy);

    if (dataItems[0].contains(title) && dataItems[1].contains(author) && dataItems[2].contains(publisher)) {
        bk[i++] = line;
        if (bk.length > 4) {break;}
    }
}

我正在添加固定大小的数组。如何将标准用作正则表达式来标识线?

1 个答案:

答案 0 :(得分:1)

好像我在这里是少数人:)但这里是一个使用正则表达式的版本,如果你感兴趣的话。

BufferedReader br = null;

String[] searches = new String[]{
            ",Animal Farm,Secker & Warburg,,",
            ",,Secker & Warburg,,",
            "George Orwell,,,,1",
            "Wilbur Smith,,,,",
            ",,,,1",
            "random,,,,1",
            "WILBUR SMITH,Birds of PREY,mcmillain,1992,1",
            ",,,,"
};

try {
    br = new BufferedReader(new FileReader("file.txt"));
    String line = null;

    // to store results of matches for easier output
    String[] matchResult = new String[searches.length];

    while ((line = br.readLine()) != null) {
        // go through all searches
        for (int i = 0; i < searches.length; i++) {

            /*
             *  replace all commas that don't have letters or numbers on both 
             *  sides with a new regex to match all characters
             */
            String searchPattern = searches[i].replaceAll("(?<![a-zA-z0-9])\\,|\\,(?![a-zA-z0-9\\,])", ".*,.*");

            // do the match on the line
            Matcher m = Pattern.compile("^" + searchPattern + "$", Pattern.CASE_INSENSITIVE).matcher(line);

            // store the result
            matchResult[i] = m.matches() == true ? "matches" : "no match";
        }

        System.out.println(String.format("%-50s %-10s %-10s %-10s %-10s %-10s %-10s %-10s", line, 
                    matchResult[0], matchResult[1], matchResult[2], matchResult[3], matchResult[4], matchResult[5], matchResult[6], matchResult[7]));
    }
} catch (Exception e) {
        e.printStackTrace();
} finally {
    try {
        br.close();
    } catch (IOException e) {}
}

输出

Wilbur Smith,Elephant Song,McMillain,1992,1        no match   no match   no match   matches    matches    no match   no match  
Wilbur Smith,Birds of Prey,McMillain,1992,1        no match   no match   no match   matches    matches    no match   matches   
George Orwell,Animal Farm,Secker & Warburg,1945,1  matches    matches    matches    no match   matches    no match   no match  
George Orwell,1984,Secker & Warburg,1949,1         no match   matches    matches    no match   matches    no match   no match