用逗号分割字符串,但避免使用转义逗号和反斜杠

时间:2014-02-11 09:43:39

标签: java regex string escaping

我想在逗号","分割一个字符串。该字符串包含转义的逗号"\,"和转义后的反斜杠"\\"。开头和结尾的逗号以及连续的几个逗号应该会导致空字符串。

因此",,\,\\,,"应该变为"""""\,\\"""""

请注意,我的示例字符串将反斜杠显示为单"\"。 Java字符串会使它们加倍。

我尝试了几个包但没有成功。我的最后一个想法是编写自己的解析器。

4 个答案:

答案 0 :(得分:0)

虽然肯定是一个专门的图书馆是一个好主意,以下将工作

    public static String[] splitValues(final String input) {
        final ArrayList<String> result = new ArrayList<String>();
        // (?:\\\\)* matches any number of \-pairs
        // (?<!\\) ensures that the \-pairs aren't preceded by a single \
        final Pattern pattern = Pattern.compile("(?<!\\\\)(?:\\\\\\\\)*,");
        final Matcher matcher = pattern.matcher(input);
        int previous = 0;
        while (matcher.find()) {
            result.add(input.substring(previous, matcher.end() - 1));
            previous = matcher.end();
        }
        result.add(input.substring(previous, input.length()));
        return result.toArray(new String[result.size()]);
    }

想法是找到,前缀为no或偶数\(即未转义,),因为,是模式切割的最后一部分位于end()-1之前的,

除了null - 输入之外,我可以想到的功能是我能想到的。如果您想更好地处理List<String>,您当然可以改变回报;我刚刚采用split()中实现的模式来处理转义。

使用此函数的示例类:

import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Print {
    public static void main(final String[] args) {
        String input = ",,\\,\\\\,,";
        final String[] strings = splitValues(input);
        System.out.print("\""+input+"\" => ");
        printQuoted(strings);
    }

    public static String[] splitValues(final String input) {
        final ArrayList<String> result = new ArrayList<String>();
        // (?:\\\\)* matches any number of \-pairs
        // (?<!\\) ensures that the \-pairs aren't preceded by a single \
        final Pattern pattern = Pattern.compile("(?<!\\\\)(?:\\\\\\\\)*,");
        final Matcher matcher = pattern.matcher(input);
        int previous = 0;
        while (matcher.find()) {
            result.add(input.substring(previous, matcher.end() - 1));
            previous = matcher.end();
        }
        result.add(input.substring(previous, input.length()));
        return result.toArray(new String[result.size()]);
    }

    public static void printQuoted(final String[] strings) {
        if (strings.length > 0) {
            System.out.print("[\"");
            System.out.print(strings[0]);
            for(int i = 1; i < strings.length; i++) {
                System.out.print("\", \"");
                System.out.print(strings[i]);
            }
            System.out.println("\"]");
        } else {
            System.out.println("[]");
        }
    }
}

答案 1 :(得分:0)

在这种情况下,自定义功能对我来说听起来更好。试试这个:

public String[] splitEscapedString(String s) {
    //Character that won't appear in the string.
    //If you are reading lines, '\n' should work fine since it will never appear.
    String c = "\n";
    StringBuilder sb = new StringBuilder();
    for(int i = 0;i<s.length();++i){
        if(s.charAt(i)=='\\') {
            //If the String is well formatted(all '\' are followed by a character),
            //this line should not have problem.
            sb.append(s.charAt(++i));                
        }
        else {
            if(s.charAt(i) == ',') {
                sb.append(c);
            }
            else {
                sb.append(s.charAt(i));
            }
        }
    }
    return sb.toString().split(c);
}

答案 2 :(得分:0)

请勿使用.split(),但要找到(未转义)逗号之间的所有匹配项:

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile(
    "(?:         # Start of group\n" +
    " \\\\.      # Match either an escaped character\n" +
    "|           # or\n" +
    " [^\\\\,]++ # Match one or more characters except comma/backslash\n" +
    ")*          # Do this any number of times", 
    Pattern.COMMENTS);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    matchList.add(regexMatcher.group());
} 

结果:["", "", "\\,\\\\", "", ""]

我使用了possessive quantifier++),以避免因嵌套量词而导致过度回溯。

答案 3 :(得分:0)

我使用了以下解决方案,用于带引号('和“)和转义(\)字符的通用刺痛分离器。

public static List<String> split(String str, final char splitChar) {
    List<String> queries = new ArrayList<>();
    int length = str.length();
    int start = 0, current = 0;
    char ch, quoteChar;
    
    while (current < length) {
        ch=str.charAt(current);
        // Handle escape char by skipping next char
        if(ch == '\\') {
            current++;
        }else if(ch == '\'' || ch=='"'){ // Handle quoted values
            quoteChar = ch;
            current++;
            while(current < length) {
                ch = str.charAt(current);
                // Handle escape char by skipping next char
                if (ch == '\\') {
                    current++;
                } else if (ch == quoteChar) {
                    break;
                }
                current++;
            }
        }else if(ch == splitChar) { // Split sting
            queries.add(str.substring(start, current + 1));
            start = current + 1;
        }
        current++;
    }
    // Add last value
    if (start < current) {
        queries.add(str.substring(start));
    }
    return queries;
}

public static void main(String[] args) {

    String str = "abc,x\\,yz,'de,f',\"lm,n\"";
    List<String> queries = split(str, ',');
    System.out.println("Size: "+queries.size());
    for (String query : queries) {
        System.out.println(query);
    }
}

得到结果

Size: 4
abc,
x\,yz,
'de,f',
"lm,n"