在mysql中迭代字符串中的字符

时间:2011-09-11 16:34:48

标签: mysql string optimization substring distance

首先,我有一个非常具体的问题,但也许我的问题的替代方法(第二部分)也可以帮助我。

有没有办法通过mysql中的索引来处理字符串中的字符。 (即PHP $ var [2]会给你第3个字符)?

显而易见的方法是SUBSTRING(var, 3,1 ),但由于我的字符串长度为1024个字符,我认为这不是最快的解决方案。如代码示例所示,使用子字符串来检索字符串的尾部也没有获得性能差异。有没有办法迭代字符串? (转移第一个元素?)

CREATE FUNCTION hashDiff( hash1 TEXT(1024), hash2 TEXT(1024), threshold INT) 
RETURNS INT
DETERMINISTIC
BEGIN
    DECLARE diff, x, b1, b2 INT;
    SET diff =0;
    SET x = 0;
    WHILE (x<1024 AND diff<threshold)  DO
        SET b1 = ASCII(hash1); --uses first character only!!
        SET b2 = ASCII(hash2);
        SET hash1=SUBSTRING(hash1, 2 );
        SET hash2=SUBSTRING(hash2, 2 );
        SET diff=diff+ ((b1-b2)*(b1-b2));
        SET x=x+1;
    END WHILE;
    RETURN diff;
END 

如果您还没有从代码中读取它,我会尝试编写一个存储过程来计算哈希值之间的差异或距离。差异是字符方形距离的总和(即hashDiff(AA,AC)=(65-65)²+(65-67)²=4)。如果散列已经不同,则可以通过引入阈值来取消计算来实现第一次主要性能提升。但由于mysql不是我的“日常”语言,我在这一点上坚持寻找其他优化。为了完整性,两个样本哈希值:

  

YAAAAAAYAAAYAAVAAQAARAOAAOAQASAQAMAKAKAJIAJAJIAHAHIAKJAIIAHHAHIIAIHGAGFFAGGFEAFEEEEAEDDDDDAEEEEDEEEFAFFFFFFEFFFEFFFFFGFEEFFEEEFFFJEFFEEEEEEELFFFFEEFJEEEEDIEEEEEIEEEEHEEEJEEFKFEFKGGFNHGOIIJTJKYONYNMTGHNHHQISJJQIKWLXJJSMYRQWJOGKDDFCCBBAAAAAAAAAAAAAAAAAAAAAAAYAAAAAAYAAAYAAWAARAASASAAQARAUAYAYATAOALKAJAJIAIAHHAHGAGFAFFAEFFAEFFAFFFAEEFFAFEEEDADEDDDDADDDCDDDDDAEEEFEEEEDDDEEEDEDDEEEFEFFGGFMFHGFFFGFFFLGHGGHGGNHHGGGOHGHGHMGGFGMFFFMFGFLFFFMGFFMGGMGGGNGGMGGLGGLGGMGGLEIEEHDCGCGCDGDGDCGDFCECCECECECECFCECFCFCFCFCFCGCJGYCYAAAAAAYAAAYAAUAATAAUAUAAUARARAQAPAPASARRAPARQAPAQQAQQAQSAKMATKKAIIHAIHGAGGGGAGHHGGAGGFGFFAFFGEFFFFFAFFGFGGGFFFEEFGFFGGFGGHIJJLKLWLKJJIJJJKJRLJKLKKKUKLLKKUMMKJIQIIIISKJJWKLLXMLMYMLNYMMYMLLWJIQIINFGKFFKEEIDHEDHDDFCECCFDECCFCFDGCDGCGCGEGCDCECECFDFCGDGCIEKEOAYNFBREUXKPQMMQTKTMMNJLPPVYYYTOUOPOLLJKKJJJIJIMJJJLIJJLLJIIHHIHHHIGHIHIHJHHHJHHIHGHGHFGHGFFEFEEEFEFEFFGGHIHIHGHGHHIIIIHIIJMNLONKLKKKKKKKMLKKLONMKOOOMLOPONMNMKKLLKKLMNKLMMMNMOPPOORPORSSVRTSSRTRRTSSTTXSTQRPONOKKLKLJMKJJIJIIHHHIIIJHI JIJJIJIKJIMWMYYDAAAAAAAAAAA

     

AAAAAAAAAAABAABAACAACACAACADADAEADADADADDAEAEEAEAFEAEEAEFAFGAGGGAGGGAHHHAHIIIAIHIJHAIIHIHHAJIHIJIJKJAJJJIKJJJJKKJKJKKLKLKLLMMMNNMYOOOOOOPOONYOONONNPYNOOOPYOOPPPYNONNYMLLWLLKUJIISHIHOGGMFGFLFFMGGLFGLGFLFFKFKFFLEEKFLEFJFKFGNGNHLFHJFIEGDIEKGOIRFGBBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABABACACDACADDACADDADDAEDAFEAEEFAFFFAFFGAFGGGAIGHHIAHIHHHHAHIHIIJJIIAIIIIJIJKJIIIIJJHIIHIIIIJIIIIRJJJJKJJJJLVKLLKLLKXLMMKMXMLLLMWMMMMYMNLYMNNYNNMYMMNYMLYLMLXKJRIHPHIMGGMFEJEJEEIEEHDGCDFCFDCFCECECCEBEBECFDGCFDNGLDBAAAAAAAAAAAAAAAAAAAAAABAAAAABAABABAACACACACACACACADDADAEEAFAFGAFGAHGAGGAGGHAGGIAIHJAJJJJAJKKKKAMLMNNNANOMMNNMMNAONMNOOOMOOPOMNOMMNPOOPPPPRQQYPPRPPPPPNOYLLMMMMLYLMLMLYLMLMMYLNNMYNLLWMLKXLLLUKIKQIIQGHHPFHNGFLFFLGFJEEJEIDDIDCHDFCDGCFCCFCECECCECFCGDGDHDHDIFIDEBBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAABBBBBCBCCCCDCCCCCCCCDDDDEDEEEEFFDEGGHGHHHGHHHHHHIIJJJJJIJJJJJJIKJJKLKKMMNMMMMMMMNNNNNNLNNONPONNNOOOOPQQQRSSSSSSUTSTUUUVWVVXUYXWVXVXWYVYWYVYYUWVUTTSSPQPQOPOPONONOMONOOONNNMMNLJJKJIIJHHGGGFHFGFFFFEEEDD EEEEFGGIGJLRNEAAAAAAAAAAAAA

任何帮助或提示都将不胜感激。

1 个答案:

答案 0 :(得分:0)

您可以使用一系列排序的唯一方法是使用临时表和游标/结果集。

问题是你仍然需要迭代字符串并使用子字符串将它们分开。据我所知,没有'wordwrap'或'explode'功能来切断字符串。