Question

在我提出的上一个问题中，

String split in java using advanced regex

有人给了我一个很好的答案我的问题（如上面的链接所述）但我从来没有完全理解它。有人能帮助我吗？我得到了正则表达式是这个“

"(?s)(?=(([^\"]+\"){2})*[^\"]*$)\\s+"

我可以理解一些基本的东西，但是这个正则表达式的某些部分甚至在之后彻底搜索谷歌我找不到，像在s之前的问号开始，或者第二个括号如何与问号和开头的等式一起使用。是否有可能扩展它并使其能够使用其他类型的引号，例如“”？

非常感谢任何帮助。

Answer 1

"(?s)(?=(([^\"]+\"){2})*[^\"]*$)\\s+"解释;

(?s)    # This equals a DOTALL flag in regex, which allows the `.` to match newline characters. As far as I can tell from your regex, it's superfluous.
(?=     # Start of a lookahead, it checks ahead in the regex, but matches "an empty string"(1) read more about that [here][1] 
(([^\"]+\"){2})*  # This group is repeated any amount of times, including none. I will explain the content in more detail.
    ([^\"]+\")    # This is looking for one or more occurrences of a character that is not `"`, followed by a `"`. 
    {2}           # Repeat 2 times. When combined with the previous group, it it looking for 2 occurrences of text followed by a quote. In effect, this means it is looking for an even amount of `"`.
[^\"]*  # Matches any character which is not a double quote sign. This means literally _any_ character, including newline characters without enabling the DOTALL flag
$       # The lookahead actually inspects until end of string.
)       # End of lookahead
\\s+    # Matches one or more  whitespace characters, including spaces, tabs and so on

在那里重复两次的那个复杂的组将在这个字符串中的空格中匹配，该空格不在两个"之间;

text that has a "string in it".

与String.split一起使用时，将字符串拆分为; [text, that, has, a, "string in it".]

只有偶数"才会匹配，因此以下内容将匹配所有空格;

text that nearly has a "string in it.

将字符串拆分为[text, that, nearly, has, a, "string, in, it.]

（1）当我说一个捕获组匹配“空字符串”时，我的意思是它实际上没有捕获任何东西，它只是从你正则表达式中的点向前看，并检查一个条件，实际上没有捕获任何东西。实际的捕获是由前瞻后的\\s+完成的。

Answer 2

(?s)部分是嵌入式标记表达式，启用DOTALL模式，表示以下内容：

在dotall模式下，表达式。匹配任何字符，包括行终止符。默认情况下，此表达式与行终止符不匹配。

(?=expr)是预见表达式。这意味着正则表达式看起来匹配expr，但在继续评估的其余部分之前会回到同一点。

在这种情况下，这意味着正则表达式匹配任何\\s+出现，后跟任意偶数"，然后是非"直到结束（{ {1}}）。换句话说，它会检查前面是否有$偶数。

它也可以扩展到其他报价。唯一的问题是"部分，可能必须使用反向引用（([^\"]+\"){2}）而不是\n。

Answer 3

这很简单..

<强>概念

只要前方有{strong>偶数 <{1}}，就会在\s+处分开。

例如：

<强>语法

Hello hi "Hi World" ^ ^ ^ | | |->will not split here since there are odd number of " ---- | |->split here because there are even number of " ahead匹配\s或\n或\r或space

\t是一个量词，它匹配前一个字符或第1组到多次

+会匹配除[^\"]

之外的任何内容

"将匹配(x){2} 2次

如果a后跟bc

，

x将匹配

a(?=bc)首先从当前位置检查ab，然后返回其位置。然后匹配a。(?=ab)a将不匹配c

使用(?=ab)c（单线模式）(?s)会匹配换行符。因此，在这种情况下，由于没有.

，因此不需要(?s)

我会用

分析更复杂的正则表达式

3 个答案: