当空格添加到字符序列的末尾时,为什么此RegEx失败?

时间:2015-12-16 23:31:05

标签: javascript regex unicode

这是我在javascript中检测到的一些代码,用于检测字符串是否为从右到左(RTL)脚本:

is_right_to_left : function (text) {

      /*
       * Right-to-left Unicode blocks for modern scripts are:
       *
       * Consecutive range of the main letters:
       * U+0590  to U+05FF  - Hebrew
       * U+0600  to U+06FF  - Arabic
       * U+0700  to U+074F  - Syriac
       * U+0750  to U+077F  - Arabic Supplement
       * U+0780  to U+07BF  - Thaana
       * U+07C0  to U+07FF  - N'Ko
       * U+0800  to U+083F  - Samaritan
       *
       * Arabic Extended:
       * U+08A0  to U+08FF  - Arabic Extended-A
       *
       * Consecutive presentation forms:
       * U+FB1D  to U+FB4F  - Hebrew presentation forms
       * U+FB50  to U+FDFF  - Arabic presentation forms A
       *
       * More Arabic presentation forms:
       * U+FE70  to U+FEFF  - Arabic presentation forms B
       */

        var ltrChars        = 'A-Za-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02B8\u0300-\u0590\u0800-\u1FFF'+'\u2C00-\uFB1C\uFDFE-\uFE6F\uFEFD-\uFFFF',
            rtlChars        = '\u0591-\u07FF\uFB1D-\uFDFD\uFE70-\uFEFC',
            rtlDirCheck     = new RegExp('^[^'+ltrChars+']*['+rtlChars+']');

        return rtlDirCheck.test(text);

    }

这正在完成我所做的所有测试。但是,如果我从RTL脚本向某些字符序列添加空格,则它将无法通过测试。例如,如果我有ﺮﺳﻷﺍ,则函数正确检测到该字符串是RTL。但是,如果我添加一个尾随空格,那么该函数会报告它不是RTL脚本。我的RegEx有什么问题?我想确保一个空间不会被抛弃。有什么想法吗?

1 个答案:

答案 0 :(得分:0)

  

我不想在任何地方留下空间,而不仅仅是在最后。我怎样才能做到这一点?

您想使用负向前瞻操作员。



function is_right_to_left (text) {

      /*
       * Right-to-left Unicode blocks for modern scripts are:
       *
       * Consecutive range of the main letters:
       * U+0590  to U+05FF  - Hebrew
       * U+0600  to U+06FF  - Arabic
       * U+0700  to U+074F  - Syriac
       * U+0750  to U+077F  - Arabic Supplement
       * U+0780  to U+07BF  - Thaana
       * U+07C0  to U+07FF  - N'Ko
       * U+0800  to U+083F  - Samaritan
       *
       * Arabic Extended:
       * U+08A0  to U+08FF  - Arabic Extended-A
       *
       * Consecutive presentation forms:
       * U+FB1D  to U+FB4F  - Hebrew presentation forms
       * U+FB50  to U+FDFF  - Arabic presentation forms A
       *
       * More Arabic presentation forms:
       * U+FE70  to U+FEFF  - Arabic presentation forms B
       */

        var ltrChars        = 'A-Za-z\\u00C0-\\u00D6\\u00D8-\\u00F6\\u00F8-\\u02B8\\u0300-\\u0590\\u0800-\\u1FFF'+'\\u2C00-\\uFB1C\\uFDFE-\\uFE6F\\uFEFD-\\uFFFF',
            rtlChars        = '\\u0591-\\u07FF\\uFB1D-\\uFDFD\\uFE70-\\uFEFC',
            rtlDirCheck     = new RegExp('^(?!.*['+ltrChars+'\\s]+.*)['+rtlChars+']$');

        return rtlDirCheck.test(text);

    }
    alert(is_right_to_left("ﺮﺳﻷ ﺍ"));




以下是正则表达式的简要说明:

Regular expression visualization