描述

Question

我有一个像这样构建的几千个条目的文本文件：

11111111111：文本文本文本文本文本:: word11111111：文本文本文本文本:: word111111111：

其中：

11111111是一个很大的数字
text text text text可以是anything，包括表情符号
word是8个单词之一
第二个111111111是另一个数字，但不同。

我试过了，但是无法与之匹敌。

我不知道如何对待表情符号，另一个问题是空格不一致，有时是空格，有时是制表符，等等。

Answer 1

描述

^([0-9]+):\s*((?:(?!\s::).)*)\s::\s*([^:]+)\s*:\s*((?:(?!\s::).)*)\s::\s*([^:]+):$

Regular expression visualization

此正则表达式将执行以下操作：

捕获前导11111111
匹配:
捕获可能包含表情符号的text text text text text。
匹配::
捕获word11111111
匹配:
捕获可能包含表情符号的text text text text text。
匹配::
捕获word11111111
匹配:
允许:或::成为分隔符
请勿在分隔符中包含分隔符周围的空格。

要更好地查看图像，您可以右键单击它并选择在新窗口中打开

实施例

现场演示

https://regex101.com/r/qG7uZ7/1

示例文字

11111111111: text text text text text :: word11111111: text text text text :: word111111111:

从匹配中捕获群组

0.  11111111111: text text text text text :: word11111111: text text text text :: word111111111:
1.  `11111111111`
2.  `text text text text text`
3.  `word11111111`
4.  `text text text text`
5.  `word111111111`

解释

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of a "line"
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
----------------------------------------------------------------------
      (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
        \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
        ::                       '::'
----------------------------------------------------------------------
      )                        end of look-ahead
----------------------------------------------------------------------
      .                        any character except \n
----------------------------------------------------------------------
    )*                       end of grouping
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  ::                       '::'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
    [^:]+                    any character except: ':' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \4:
----------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
----------------------------------------------------------------------
      (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
        \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
        ::                       '::'
----------------------------------------------------------------------
      )                        end of look-ahead
----------------------------------------------------------------------
      .                        any character except \n
----------------------------------------------------------------------
    )*                       end of grouping
----------------------------------------------------------------------
  )                        end of \4
----------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  ::                       '::'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \5:
----------------------------------------------------------------------
    [^:]+                    any character except: ':' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \5
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
  $                        before an optional \n, and the end of a
                           "line"
----------------------------------------------------------------------

如何匹配这种模式（使用表情符号）？

1 个答案:

描述

实施例

解释