Question

在我的MySQL数据库中，我有一列UTF-8格式的字符串，例如我想用RegEx提取第一个字符。

假设RegEx仅提取以下字符：

ਹਮਜਰਣਚਕਨਖਲਨ

并给出以下字符串：

ਹੁਕਮਿ ਰਜਾਈ ਚਲਣਾ ਨਾਨਕ ਲਿਖਿਆ ਨਾਲਿ ॥੧॥

提取的唯一字符是：

ਹਰਚਨਲਨ

我知道解决此问题需要以下步骤：

使用空格作为分隔符
对于每个单词，如果它与有效字符的正则表达式匹配，则提取第一个字母（子字符串的子字符串）

我已经查看了所有类似的问题/答案，到目前为止还没有人能够解决我的问题。

Answer 1

我真的不知道MySql Regex语法和限制（从未使用过），但是你可以在字符串之前添加前导空格，并匹配这样简单的东西：“（[ਮਜਰਣਚਕਨਖਲਨ] {1}）”

所以，如果你连接匹配的组，你将有这个字符串“ਰਚਨਲਨ”（只有“ਹ”不匹配，因为它在样本中不存在“）

在C＃中它可能看起来像这样（工作样本）：

namespace TestRegex
{
    using System.Linq;
    using System.Text.RegularExpressions;
    using System.Windows.Forms;

    class Program
    {
        static void Main(string[] args)
        {
            // leading space(to match first word too)
            // + sample string
            var sample = " ";
            sample +=  "ਹੁਕਮਿ ਰਜਾਈ ਚਲਣਾ ਨਾਨਕ ਲਿਖਿਆ ਨਾਲਿ ॥੧॥"; 

            // Regex pattern that will math space, and
            // if next character in set - add it to "match group 1"
            var pattern = " ([ਮਜਰਣਚਕਨਖਲਨ]{1})";

            // select every "match group 1" from matches as array
            var result = from Match m in Regex.Matches(sample, pattern) 
                         select m.Groups[1];

            // concatenate array content into one string and
            // show it in message box to user, for example..
            MessageBox.Show(string.Concat(result)); 
        }
    }
}

在大多数非查询语言中，它看起来几乎相同。例如在php中你需要做preg_match_all，并在foreach循环中添加“$ match [i] [1]”（每个“匹配组1”）从每个匹配到一个单个字符串的结尾。

好吧......非常简单。但不适用于mysql ...

Answer 2

我终于在我的程序员朋友的帮助下实现了这一目标。我直接将以下代码粘贴到SQL中数据库的PhpMyAdmin部分：

delimiter $$
drop function if exists `initials`$$
CREATE FUNCTION `initials`(str text, expr text) RETURNS text CHARSET utf8
begin
    declare result text default '';
    declare buffer text default '';
    declare i int default 1;
    if(str is null) then
        return null;
    end if;
    set buffer = trim(str);
    while i <= length(buffer) do
        if substr(buffer, i, 1) regexp expr then
            set result = concat( result, substr( buffer, i, 1 ));
            set i = i + 1;
            while i <= length( buffer ) and substr(buffer, i, 1) regexp expr do
                set i = i + 1;
            end while;
            while i <= length( buffer ) and substr(buffer, i, 1) not regexp expr do
                set i = i + 1;
            end while;
        else
            set i = i + 1;
        end if;
    end while;
    return result;
end$$

drop function if exists `acronym`$$
CREATE FUNCTION `acronym`(str text) RETURNS text CHARSET utf8
begin
    declare result text default '';
    set result = initials( str, '[ੴਓੳਅੲਸਹਕਖਗਘਙਚਛਜਝਞਟਠਡਢਣਤਥਦਧਨਪਫਬਭਮਯਰਲਵੜਸ਼ਖ਼ਗ਼ਜ਼ਫ਼ਲ਼]' );
    return result;
end$$
delimiter ;

UPDATE scriptures SET search = acronym(scripture)

只是解释最后一行：

scriptures是我要更新的表格
search是我在表格中创建的新空列，用于存储结果
scripture是scriptures表格中的现有列，包含我要从中提取的所有字符串
acronym是先前声明的函数，它希望将每个单词的第一个字母与RegEx [ੴਓੳਅੲਸਹਕਖਗਘਙਚਛਜਝਞਟਠਡਢਣਤਥਦਧਨਪਫਬਭਮਯਰਲਵੜਸ਼ਖ਼ਗ਼ਜ਼ਫ਼ਲ਼]

因此，代码的最后一行将遍历列scripture的每一行，将函数acronym应用于该行，并将结果存储在新的search列中。

完美！正是我在寻找的东西：）

使用RegEx提取MySQL中每个单词的第一个字符

2 个答案: