我有一个像这样构建的几千个条目的文本文件:
11111111111:文本文本文本文本文本:: word11111111:文本文本文本文本:: word111111111:
其中:
11111111
是一个很大的数字text text text text
可以是anything
,包括表情符号word
是8个单词之一111111111
是另一个数字,但不同。我试过了,但是无法与之匹敌。
我不知道如何对待表情符号,另一个问题是空格不一致,有时是空格,有时是制表符,等等。
答案 0 :(得分:1)
^([0-9]+):\s*((?:(?!\s::).)*)\s::\s*([^:]+)\s*:\s*((?:(?!\s::).)*)\s::\s*([^:]+):$
此正则表达式将执行以下操作:
11111111
:
text text text text text
。 ::
word11111111
:
text text text text text
。 ::
word11111111
:
:
或::
成为分隔符要更好地查看图像,您可以右键单击它并选择在新窗口中打开
现场演示
https://regex101.com/r/qG7uZ7/1
示例文字
11111111111: text text text text text :: word11111111: text text text text :: word111111111:
从匹配中捕获群组
0. 11111111111: text text text text text :: word11111111: text text text text :: word111111111:
1. `11111111111`
2. `text text text text text`
3. `word11111111`
4. `text text text text`
5. `word111111111`
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
:: '::'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
:: '::'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
[^:]+ any character except: ':' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \4:
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
:: '::'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
) end of \4
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
:: '::'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \5:
----------------------------------------------------------------------
[^:]+ any character except: ':' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \5
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
$ before an optional \n, and the end of a
"line"
----------------------------------------------------------------------