Question

Yunqa DiRegExpr中的（？i）令牌用于使匹配大小写不敏感但在使用西里尔文本时似乎不起作用。例如：

\P{Cyrillic}(?i)ново

应与大写Ново匹配，但不匹配。有没有办法使这项工作？

使用DiRegEx Workbench应用程序，我们看到：

我的代码使用：

 if ContainsText(MatchPattern, '(?i)') or 
    ContainsText(MatchPattern, '(?is)') or 
    ContainsText(MatchPattern, '(?si)') then 
      rexp.CompileOptions := [coCaseLess];

Answer 1

如果您使用的是UTF8编码的Unicode字符串，则必须使用[coUtf8]（PCRE_UTF8）编译选项。匹配成功与此设置。

即：

 rexp.CompileOptions := rexp.CompileOptions + [coUtf8];

要查看结果，您需要在控制台中安装一个unicode字体，但这是一个演示此功能的示例程序。

program Project1;
{$APPTYPE CONSOLE}
uses
  Windows, DISystemCompat, DIUtils, DIRegEx;
var
  RegEx: TDIRegEx16;
  matched : string;
begin
  SetConsoleOutputCP(CP_UTF8);
  RegEx := TDIPerlRegEx16.Create(nil);
  try
    { comment out line below to replicate problem }
    RegEx.CompileOptions := [coUtf8];
    RegEx.SetSubjectStr('Ново');
    RegEx.CompileMatchPatternStr('(?i)ново');
    if RegEx.Match(0) > 0 then
      repeat
        matched := RegEx.MatchedStr;
        WriteLn('Matched: ', UTF8Encode(matched));
      until RegEx.MatchNext < 0
    else
      WriteLn('No match.');
  finally
    RegEx.Free;
  end;
  ReadLn;
end.

您会注意到，通过在模式中包含(?i)，您不需要在编译选项中包含[coCaseLess]（因为您还明确地在匹配中指定了它）。

如果你确实想要使用编译选项，你可以省略模式中的(?i)，而只是这样做，这也有效：

 RegEx.CompileOptions := [coCaseLess, coUtf8];
 RegEx.SetSubjectStr('Ново');
 RegEx.CompileMatchPatternStr('ново');

Yunqa DiRegExpr - 针对西里尔文本的案例不敏感匹配

1 个答案: