如何在正则表达式中考虑特殊的非ASCII字符

时间:2016-05-07 13:37:26

标签: java regex

我不知道这是否是问题,但我似乎无法将其与此相匹配。

String [] seTab3_HighRes=null;

public Map<String, String> tab3HighResRegex(String x, Map<String,String> map) {

Pattern Tab3_HighRes_pattern = Pattern.compile("High Resolution Parameters:(.*?Intrabolus pressure)",Pattern.DOTALL);
Matcher matcherTab3_HighRes_pattern = Tab3_HighRes_pattern.matcher(x);


while (matcherTab3_HighRes_pattern.find()) {
    System.out.println("Anything here? Nope");
    seTab3_HighRes=matcherTab3_HighRes_pattern.group(1).split("\\n|\\r");
    }
}

案文是:

 High Resolution Parameters:
    Intrabolus pressure (@LESR)(mmHg):-3.7 <8.4
    Some other stff: 123
    Intrabolus pressure (avg max)(mmHg):8.3 <17.0

我在文本中看了一下,注意到当我将文本粘贴到textpad中时,^G末尾有High Resolution Parameters:个字符。它是什么,是因为我没有得到匹配(以及如何摆脱它?)

1 个答案:

答案 0 :(得分:0)

描述

您只需将^G控件G与\cG

匹配即可

此正则表达式执行以下操作:

  • 匹配High Resolution Parameters:
  • 找到第一个Intrabolus pressure
  • Intrabolus pressure ... :
  • 之后拉出子串

正则表达式

High\sResolution\sParameters:(?:\cG|[\n\r\s])*(?:Intrabolus\spressure)[^:]*:([^\n]*)

Regular expression visualization

实施例

https://regex101.com/r/pE5aI0/1

解释

  • Capture Group 0获取整个字符串
  • Capture Group 1获取Intrabolus pressure

<强>扩展

NODE                     EXPLANATION
----------------------------------------------------------------------
  High                     'High'
----------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  Resolution               'Resolution'
----------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  Parameters:              'Parameters:'
----------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    \cG                      ^G
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    [\n\r\s]                 any character of: '\n' (newline), '\r'
                             (carriage return), whitespace (\n, \r,
                             \t, \f, and " ")
----------------------------------------------------------------------
  )*                       end of grouping
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    Intrabolus               'Intrabolus'
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
    pressure                 'pressure'
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
  [^:]*                    any character except: ':' (0 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [^\n]*                   any character except: '\n' (newline) (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1