如何匹配这种模式(使用表情符号)?

时间:2016-05-29 09:41:54

标签: regex text emoji

我有一个像这样构建的几千个条目的文本文件:

  

11111111111:文本文本文本文本文本:: word11111111:文本文本文本文本:: word111111111:

其中:

  • 11111111是一个很大的数字
  • text text text text可以是anything,包括表情符号
  • word是8个单词之一
  • 第二个111111111是另一个数字,但不同。

我试过了,但是无法与之匹敌。

我不知道如何对待表情符号,另一个问题是空格不一致,有时是空格,有时是制表符,等等。

1 个答案:

答案 0 :(得分:1)

描述

^([0-9]+):\s*((?:(?!\s::).)*)\s::\s*([^:]+)\s*:\s*((?:(?!\s::).)*)\s::\s*([^:]+):$

Regular expression visualization

此正则表达式将执行以下操作:

  • 捕获前导11111111
  • 匹配:
  • 捕获可能包含表情符号的text text text text text
  • 匹配::
  • 捕获word11111111
  • 匹配:
  • 捕获可能包含表情符号的text text text text text
  • 匹配::
  • 捕获word11111111
  • 匹配:
  • 允许:::成为分隔符
  • 请勿在分隔符中包含分隔符周围的空格。

要更好地查看图像,您可以右键单击它并选择在新窗口中打开

实施例

现场演示

https://regex101.com/r/qG7uZ7/1

示例文字

11111111111: text text text text text :: word11111111: text text text text :: word111111111:

从匹配中捕获群组

0.  11111111111: text text text text text :: word11111111: text text text text :: word111111111:
1.  `11111111111`
2.  `text text text text text`
3.  `word11111111`
4.  `text text text text`
5.  `word111111111`

解释

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of a "line"
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
----------------------------------------------------------------------
      (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
        \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
        ::                       '::'
----------------------------------------------------------------------
      )                        end of look-ahead
----------------------------------------------------------------------
      .                        any character except \n
----------------------------------------------------------------------
    )*                       end of grouping
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  ::                       '::'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
    [^:]+                    any character except: ':' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \4:
----------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
----------------------------------------------------------------------
      (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
        \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
        ::                       '::'
----------------------------------------------------------------------
      )                        end of look-ahead
----------------------------------------------------------------------
      .                        any character except \n
----------------------------------------------------------------------
    )*                       end of grouping
----------------------------------------------------------------------
  )                        end of \4
----------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  ::                       '::'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \5:
----------------------------------------------------------------------
    [^:]+                    any character except: ':' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \5
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
  $                        before an optional \n, and the end of a
                           "line"
----------------------------------------------------------------------