有人可以解释这个名为正则表达式组吗?

时间:2014-03-19 14:43:29

标签: regex

在Sublime Text包CheckBounce(在ST中实现OSX内置拼写检查的包)中,存在一个选项,用于定义每个语法应该检查的文本。 以下是设置文件中的说明:

// Optionally define regular expressions to set the scope in which spelling should
// be checked. The regex should define a named group called "checktext", which the
// package will use to extract the text to check. For example, a regex to skip the
// preamble in a LaTeX document might look like this:
//      (?s)(?<=\\begin\{document\})(?P<checktext>.*)
// This expression would only match text between HTML tags:
//      (\<\w+\>)(?P<checktext>.*)(\</\w+\>)
// Each one must appear in the dictionary below with the key set to the syntax name
// and the value set to the regular expression. Remember to double your backslashes.

那么提到的两个正则表达式如何工作?我不明白它们与前导码和HTML标签的匹配程度。

解释可以帮助我编写自己的正则表达式,以排除所有LaTeX特定语法,如\centering\cref

1 个答案:

答案 0 :(得分:1)

这可能有助于您入门 请注意,这些是非常通用的正则表达式。

 ## Regex 1 ---------------------------
 (?s)                    # Dot-All modifier (means dot . matches all chars, including newlines)
 (?<=                    # Lookbehind assertion
      \\ begin                # Literal escape + 'begin'
      \{ document \}          # Literal '{' + 'document' + '}'
 )
 (?P<checktext>          # (1 start), Python style named capture group
      .*                      # Greedy dot, match as many char's possible until end of string
 )                       # (1 end)

 ## Regex 2 ---------------------------
 (                       # (1 start), Open TAG
      \< \w+ \>               # Literal '<' + many words + '>'
 )                       # (1 end)
 (?P<checktext>          # (2 start), Python style named capture group
      .*                      # Greedy dot, match as many non-newline char's possible until end of line or string
 )                       # (2 end)
 (                       # (3 start), Close TAG
      \< / \w+ \>             # Literal '<' + '/' + many words + '>'
 )                       # (3 end)