Question

我有一个巨大的日语例句。它的设置使得一行是句子，然后下一行由{}，（）和[]分隔的句子中使用的单词组成。基本上，我想从文件中读取一行，只找到（）中的单词，将它们存储在一个单独的文件中，然后从字符串中删除它们。

我试图用regexp来做这件事。以下是我正在使用的文字：

は 二十歳(はたち){２０歳} になる[01]{になりました}

以下是我用来查找（）之间的内容的代码：

std::smatch m;
std::regex e ("\(([^)]+)\)");   // matches things between ( and )

if (std::regex_search (components,m,e)) {
   printToTest(m[0].str(), "what we got"); //Prints to a test file "what we got: " << m[0].str()
   components = m.prefix().str().append(m.suffix().str());
   //commponents is a string
   printToTest(components, "[COMP_AFTER_REMOVAL]");
   //Prints to test file "[COMP_AFTER_REMOVAL]: " << components 
}

这里应该打印什么：

what we got:はたち
[COMP_AFTER_REMOVAL]:は 二十歳(){２０歳} になる[01]{になりました}

这是打印的内容：

what we got:は 二十歳(はたち
[COMP_AFTER_REMOVAL]:){２０歳} になる[01]{になりました}

似乎某种程度上は被混淆了（这使得正则表达式从は变为）。我认为从文件中读取行的方式存在问题。也许它不是以某种方式被读作utf8。这就是我的所作所为：

xml_document finalDoc;
string sentence;
string components;
ifstream infile;

infile.open("examples.utf");
unsigned int line = 0;
string linePos;
bool eof = infile.eof();
while (!eof && line < 1){       
    getline(infile, sentence);
    getline(infile, components);
    MakeSentences(sentence, components, finalDoc);
    line++;
}

有什么不对吗？有小费吗？需要更多代码？请帮忙。感谢。

Answer 1

你忘了逃避反击。编译器看到"\(([^)]+)\)"并将其解释为(([^)]+))，这不是您想要的正则表达式。

您需要输入"\\(([^)]+)\\)"

为什么正则表达式不能在C ++中找到日语字符串中的“（”）？

1 个答案: