正则表达式阿拉伯语文本段落

时间:2017-01-03 17:39:04

标签: qt arabic qstring qregexp

给定字符串:

QString unformatted =
   "Some non arabic text"
   "بعض النصوص العربية"
   "Another non arabic text"
   "النص العربي آخر";

如何使用QRegExp或其他方式获得以下结果:

"<p>Some non arabic text</p>"
"<p dir='rtl'>بعض النصوص العربية</p>"
"<p>Another non arabic text</p>"
"<p dir='rtl'>النص العربي آخر</p>";

谢谢!

1 个答案:

答案 0 :(得分:1)

用阿拉伯语表达式分隔的功能:

QString split_arabic(QString text){
    QRegExp rx("[\u0600-\u065F\u066A-\u06EF\u06FA-\u06FF][ \u0600-\u065F\u066A-\u06EF\u06FA-\u06FF]+");
    int pos = 0;


    QStringList list;

    while ((pos = rx.indexIn(text, pos)) != -1) {
        list << rx.cap(0);
        pos += rx.matchedLength();
    }

    for(int i=0; i < list.length(); i++){
        QString str = list.at(i);
        text.replace(str, "<p dir='rtl'>"+str+"</p>");
    }

    return text;
}

示例:

QString unformatted =
            "Some non arabic text"
            "بعض النصوص العربية"
            "Another non arabic text"
            "النص العربي آخر";


qDebug()<<unformatted;
qDebug()<<split_arabic(unformatted);

输出:

"Some non arabic textبعض النصوص العربيةAnother non arabic textالنص العربي آخر"
"Some non arabic text<p dir='rtl'>بعض النصوص العربية</p>Another non arabic text<p dir='rtl'>النص العربي آخر</p>"