我正在编译一个启用了utf8标志的PCRE模式,并尝试匹配utf8 char*
字符串,但它不匹配,pcre_exec
返回负数。我将主题长度为65传递给pcre_exec
,这是字符串中的字符数。我相信它期望字节数,所以我尝试将参数增加到70但仍然得到相同的结果。我不知道还有什么让比赛失败。在我开枪之前请帮忙。
(但是如果我尝试没有标志PCRE_UTF8
,它匹配,但偏移矢量[1]是30,这是我的输入字符串中的unicode字符之前的字符的索引)
#include "stdafx.h"
#include "pcre.h"
#include <pcre.h> /* PCRE lib NONE */
#include <stdio.h> /* I/O lib C89 */
#include <stdlib.h> /* Standard Lib C89 */
#include <string.h> /* Strings C89 */
#include <iostream>
int main(int argc, char *argv[])
{
pcre *reCompiled;
int pcreExecRet;
int subStrVec[30];
const char *pcreErrorStr;
int pcreErrorOffset;
char* aStrRegex = "(\\?\\w+\\?\\s*=)?\\s*(call|exec|execute)\\s+(?<spName>\\w+)("
// params can be an empty pair of parenthesis or have parameters inside them as well.
"\\(\\s*(?<params>[?\\w,]+)\\s*\\)"
// paramList along with its parenthesis is optional below so a SP call can be just "exec sp_name" for a stored proc call without any parameters.
")?";
reCompiled = pcre_compile(aStrRegex, 0, &pcreErrorStr, &pcreErrorOffset, NULL);
if(reCompiled == NULL) {
printf("ERROR: Could not compile '%s': %s\n", aStrRegex, pcreErrorStr);
exit(1);
}
char* line = "?rt?=call SqlTxFunctionTesting(?înFîéld?,?outField?,?inOutField?)";
pcreExecRet = pcre_exec(reCompiled,
NULL,
line,
65, // length of string
0, // Start looking at this point
0, // OPTIONS
subStrVec,
30); // Length of subStrVec
printf("\nret=%d",pcreExecRet);
//int substrLen = pcre_get_substring(line, subStrVec, pcreExecRet, 1, &mantissa);
}
答案 0 :(得分:1)
1)
char * q= "î";
printf("%d, %s", q[0], q);
输出:
63,?
2)您必须使用PCRE_BUILD_PCRE16(或32)和PCRE_SUPPORT_UTF重建PCRE。并使用pcre16.lib和/或pcre16.dll。然后你可以试试这段代码:
pcre16 *reCompiled;
int pcreExecRet;
int subStrVec[30];
const char *pcreErrorStr;
int pcreErrorOffset;
wchar_t* aStrRegex = L"(\\?\\w+\\?\\s*=)?\\s*(call|exec|execute)\\s+(?<spName>\\w+)("
// params can be an empty pair of paranthesis or have parameters inside them as well.
L"\\(\\s*(?<params>[?,\\w\\p{L}]+)\\s*\\)"
// paramList along with its paranthesis is optional below so a SP call can be just "exec sp_name" for a stored proc call without any parameters.
L")?";
reCompiled = pcre16_compile((PCRE_SPTR16)aStrRegex, PCRE_UTF8, &pcreErrorStr, &pcreErrorOffset, NULL);
if(reCompiled == NULL) {
printf("ERROR: Could not compile '%s': %s\n", aStrRegex, pcreErrorStr);
exit(1);
}
const wchar_t* line = L"?rt?=call SqlTxFunctionTesting( ?inField?,?outField?,?inOutField?,?fd? )";
const wchar_t* mantissa=new wchar_t[wcslen(line)];
pcreExecRet = pcre16_exec(reCompiled,
NULL,
(PCRE_SPTR16)line,
wcslen(line), // length of string
0, // Start looking at this point
0, // OPTIONS
subStrVec,
30); // Length of subStrVec
printf("\nret=%d",pcreExecRet);
for (int i=0;i<pcreExecRet;i++){
int substrLen = pcre16_get_substring((PCRE_SPTR16)line, subStrVec, pcreExecRet, i, (PCRE_SPTR16 *)&mantissa);
wprintf(L"\nret string=%s, length=%i\n",mantissa,substrLen);
}
3)\ w = [0-9A-Z_a-z]。它不包含unicode符号
4)这确实有帮助:http://answers.oreilly.com/topic/215-how-to-use-unicode-code-points-properties-blocks-and-scripts-in-regular-expressions/
5)来自PCRE 8.33源(pcre_exec.c:2251)
/* Find out if the previous and current characters are "word" characters.
It takes a bit more work in UTF-8 mode. Characters > 255 are assumed to
be "non-word" characters. Remember the earliest consulted character for
partial matching. */