boost :: regex与UTF-8不区分大小写(例如大写与小写的变音符号)

时间:2013-04-09 15:30:38

标签: c++ regex boost utf-8

在使用International Components for Unicode(ICU)支持构建boost :: regex版本1.52库之后,具有不区分大小写的匹配的正则表达式似乎无法按预期处理大写和小写的德语变音字符。

static const std::string pattern("^.*" "\303\226" ".*$");
static const std::string   test1("SCH" "\303\226" "NE");
static const std::string   test2("sch" "\303\266" "ne");
static const boost::regex exp(pattern, boost::regex::icase);
const char *result = (boost::regex_match(test1, exp)) ? "Match" : "NoMatch";
std::cout << "Testing \"" << test1 << "\" against pattern \"" << pattern 
    << "\" : " << result << std::endl;
result = (boost::regex_match(test2, exp)) ? "Match" : "NoMatch";
std::cout << "Testing \"" << test2 << "\" against pattern \"" << pattern 
    << "\" : " << result << std::endl;

收率:

Testing "SCHÖNE" against pattern "^.*Ö.*$" : Match
Testing "schöne" against pattern "^.*Ö.*$" : NoMatch

1 个答案:

答案 0 :(得分:2)

Working with Unicode and ICU string types

Example on LWS

#include <iostream>
#include <boost/regex.hpp>
#include <boost/regex/icu.hpp>
int main()
{
   static const std::string pattern("^.*" "\303\226" ".*$");
   static const std::string   test1("SCH" "\303\226" "NE");
   static const std::string   test2("sch" "\303\266" "ne");
   static const boost::u32regex exp=boost::make_u32regex(pattern, boost::regex::icase);
   const char *result = (boost::u32regex_match(test1, exp)) ? "Match" : "NoMatch";
   std::cout << "Testing \"" << test1 << "\" against pattern \"" << pattern 
      << "\" : " << result << std::endl;
   result = (boost::u32regex_match(test2, exp)) ? "Match" : "NoMatch";
   std::cout << "Testing \"" << test2 << "\" against pattern \"" << pattern 
      << "\" : " << result << std::endl;
}