Question

相关的问题

我有一个字符串

a\;b\\;c;d

在Java中看起来像

String s = "a\\;b\\\\;c;d"

我需要用分号按照以下规则拆分它：

如果分号前面有反斜杠，则不应将其视为分隔符（在 a 和 b 之间）。
如果反斜杠本身被转义，因此不会自动转义分号，那么分号应该是分隔符（在 b 和 c 之间）。
< / LI>
如果在它之前存在零或偶数个反斜杠，则应将分号视为分隔符。

例如上面，我想得到以下字符串（java编译器的双反斜杠）：
```
a\;b\\
c
d
```

Answer 1

您可以使用正则表达式

(?:\\.|[^;\\]++)*

匹配未转义分号之间的所有文字：

List<String> matchList = new ArrayList<String>();
try {
    Pattern regex = Pattern.compile("(?:\\\\.|[^;\\\\]++)*");
    Matcher regexMatcher = regex.matcher(subjectString);
    while (regexMatcher.find()) {
        matchList.add(regexMatcher.group());
    }

<强>解释

(?:        # Match either...
 \\.       # any escaped character
|          # or...
 [^;\\]++  # any character(s) except semicolon or backslash; possessive match
)*         # Repeat any number of times.

占有式匹配（++）对于避免由于嵌套量词而导致的灾难性回溯非常重要。

Answer 2

String[] splitArray = subjectString.split("(?<!(?<!\\\\)\\\\);");

这应该有用。

说明：

// (?<!(?<!\\)\\);
// 
// Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!(?<!\\)\\)»
//    Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!\\)»
//       Match the character “\” literally «\\»
//    Match the character “\” literally «\\»
// Match the character “;” literally «;»

因此，您只需匹配前面没有\的分号。

编辑：

String[] splitArray = subjectString.split("(?<!(?<!\\\\(\\\\\\\\){0,2000000})\\\\);");

这将照顾任何奇数。如果你的数量超过4000000，那么当然会失败。编辑答案的说明：

// (?<!(?<!\\(\\\\){0,2000000})\\);
// 
// Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!(?<!\\(\\\\){0,2000000})\\)»
//    Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!\\(\\\\){0,2000000})»
//       Match the character “\” literally «\\»
//       Match the regular expression below and capture its match into backreference number 1 «(\\\\){0,2000000}»
//          Between zero and 2000000 times, as many times as possible, giving back as needed (greedy) «{0,2000000}»
//          Note: You repeated the capturing group itself.  The group will capture only the last iteration.  Put a capturing group around the repeated group to capture all iterations. «{0,2000000}»
//          Match the character “\” literally «\\»
//          Match the character “\” literally «\\»
//    Match the character “\” literally «\\»
// Match the character “;” literally «;»

Answer 3

我不相信用任何正则表达式检测这些案例。我通常会为这些事情做一个简单的循环，我会用C草绘它，因为它是很久以前我上次触摸Java; - ）

int i, len, state;
char c;

for (len=myString.size(), state=0, i=0; i < len; i++) {
    c=myString[i];
    if (state == 0) {
       if (c == '\\') {
            state++;
       } else if (c == ';') {
           printf("; at offset %d", i);
       }
    } else {
        state--;
    }
}

优势是：

您可以对每一步执行语义操作。
将其移植到另一种语言非常容易。
您不需要为这个简单的任务包含完整的正则表达式库，这增加了可移植性。
它应该比正则表达式匹配器快很多。

Answer 4

此方法假定您的字符串中的字符串中没有char '\0'。如果你这样做，你可以使用其他一些字符。

public static String[] split(String s) {
    String[] result = s.replaceAll("([^\\\\])\\\\;", "$1\0").split(";");
    for (int i = 0; i < result.length; i++) {
        result[i] = result[i].replaceAll("\0", "\\\\;");
    }
    return result;
}

Answer 5

这是我认为的真实答案。就我而言，我尝试使用|进行拆分，转义字符为&。

    final String regx = "(?<!((?:[^&]|^)(&&){0,10000}&))\\|";
    String[] res = "&|aa|aa|&|&&&|&&|s||||e|".split(regx);
    System.out.println(Arrays.toString(res));

在此代码中，我使用Lookbehind 进行字符转义。请注意，后面必须具有最大长度。

(?<!((?:[^&]|^)(&&){0,10000}&))\\|

这表示除|之后的((?:[^&]|^)(&&){0,10000}&))以外的任何&，而这部分表示(?:[^&]|^)的任何奇数个。 &部分对于确保您将|之后的所有<ngb-accordion [closeOthers]="false" activeIds="0"> <ng-container class="card" *ngFor="let post of Posts"> <ngb-panel title="{{post.title}} - By: {{post.author}}, At: {{post.datePosted}}" id="{{post.id}}"> <ng-template ngbPanelContent> {{post.about}} <hr> <button (click)="navigateTo(post.id)" type="button" class="btn btn-link">Comments</button> </ng-template> </ngb-panel> </ng-container> </ngb-accordion>数到开头或其他一些字符非常重要。

正则表达式和逃脱和未转义的分隔符

5 个答案: