计算字符串中唯一字符的数量

时间:2015-04-30 12:02:59

标签: mysql sql database

我正在寻找一个sql语句来计算字符串中唯一字符的数量。

e.g。

3333333333 -> returns 1
1113333333 -> returns 2
1112222444 -> returns 3

我使用REGEX和mysql-string-functions进行了一些测试,但我找不到解决方案。

6 个答案:

答案 0 :(得分:6)

这是为了好玩吗?

SQL就是处理行集,所以如果我们可以转换一个单词'将一组字符作为行,然后我们可以使用'组'用来做有用的东西。

使用'关系数据库引擎'做简单的角色操控感觉不对。仍然,是否可以用SQL回答你的问题?是的,是......

现在,我总是有一个包含一个整数列的表,其中有大约500行,其中升序1 ... 500.它被称为' integerseries'。它是一个非常小的表,它使用了很多,因此它被缓存在内存中。它旨在替换查询中的from 'select 1 ... union ...文本。

通过在cross join(也是任何inner join中使用它来生成基于整数可以计算的任何事物的顺序行(表)非常有用。我用它来生成一年的日子,解析逗号分隔的字符串等。

现在,sql mid 函数可用于返回给定位置的字符。通过使用'整体'桌子我可以轻松地“#转换一个单词'到每个字符一行的字符表。然后使用'组'功能...

SET @word='Hello World';

SELECT charAtIdx, COUNT(charAtIdx)
FROM (SELECT charIdx.id,
    MID(@word, charIdx.id, 1) AS charAtIdx 
    FROM integerseries AS charIdx
    WHERE charIdx.id <= LENGTH(@word)
    ORDER BY charIdx.id ASC
    ) wordLetters
GROUP BY
   wordLetters.charAtIdx
ORDER BY charAtIdx ASC  

输出:

charAtIdx  count(charAtIdx)  
---------  ------------------
                            1
d                           1
e                           1
H                           1
l                           3
o                           2
r                           1
W                           1

注意:输出中的行数是字符串中不同字符的数量。因此,如果计算输出行的数量,则计算不同字母的数量&#39;将会知道。

此观察结果用于最终查询。

最终查询:

这里有趣的一点是移动&#39;整体&#39; &#39;交叉加入&#39;限制(1 ..长度(字))到实际的加入&#39;而不是在where子句中这样做。这为优化器提供了有关如何限制执行join时生成的数据的线索。

SELECT 
   wordLetterCounts.wordId,
   wordLetterCounts.word,   
   COUNT(wordLetterCounts.wordId) AS letterCount
FROM 
     (SELECT words.id AS wordId,
             words.word AS word,
             iseq.id AS charPos,
             MID(words.word, iseq.id, 1) AS charAtPos,
             COUNT(MID(words.word, iseq.id, 1)) AS charAtPosCount
     FROM
          words
          JOIN integerseries AS iseq
               ON iseq.id BETWEEN 1 AND words.wordlen 
      GROUP BY
            words.id,
            MID(words.word, iseq.id, 1)
      ) AS wordLetterCounts
GROUP BY
   wordLetterCounts.wordId  

输出:

wordId  word                  letterCount  
------  --------------------  -------------
     1  3333333333                        1
     2  1113333333                        2
     3  1112222444                        3
     4  Hello World                       8
     5  funny - not so much?             13

Word表格和数据:

CREATE TABLE `words` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `word` varchar(128) COLLATE utf8mb4_unicode_ci NOT NULL,
  `wordlen` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

/*Data for the table `words` */

insert  into `words`(`id`,`word`,`wordlen`) values (1,'3333333333',10);
insert  into `words`(`id`,`word`,`wordlen`) values (2,'1113333333',10);
insert  into `words`(`id`,`word`,`wordlen`) values (3,'1112222444',10);
insert  into `words`(`id`,`word`,`wordlen`) values (4,'Hello World',11);
insert  into `words`(`id`,`word`,`wordlen`) values (5,'funny - not so much?',20);

Integerseries表:此示例的范围为1 .. 30.

CREATE TABLE `integerseries` (
  `id` int(11) unsigned NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=500 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

答案 1 :(得分:4)

没有直接或简单的方法。您可能需要编写存储函数来完成工作,并查看数据中可能需要的所有字符。以下是仅仅数字的示例,可以为存储函数中的所有字符进行扩展

mysql> select * from test ;
+------------+
| val        |
+------------+
| 11111111   |
| 111222222  |
| 1113333222 |
+------------+


select 
val, 
sum(case when locate('1',val) > 0 then 1 else 0 end ) 
+ sum( case when locate('2',val) > 0 then 1 else 0 end)
+ sum(case when locate('3',val) > 0 then 1 else 0 end)
+sum(case when locate('4',val) > 0 then 1 else 0 end ) as occurence
from test group by val


+------------+-----------+
| val        | occurence |
+------------+-----------+
| 11111111   |         1 |
| 111222222  |         2 |
| 1113333222 |         3 |
+------------+-----------+

或者,如果您有足够的时间,请创建一个包含您能想到的所有字符的查找表。并在2行中进行查询

mysql> select * from test ;
+------------+
| val        |
+------------+
| 11111111   |
| 111222222  |
| 1113333222 |
+------------+
3 rows in set (0.00 sec)

mysql> select * from look_up ;
+------+------+
| id   | val  |
+------+------+
|    1 | 1    |
|    2 | 2    |
|    3 | 3    |
|    4 | 4    |
+------+------+
4 rows in set (0.00 sec)

select 
t1.val, 
sum(case when locate(t2.val,t1.val) > 0 then 1 else 0 end ) as occ 
from test t1,(select * from look_up)t2 
group by t1.val ;

+------------+------+
| val        | occ  |
+------------+------+
| 11111111   |    1 |
| 111222222  |    2 |
| 1113333222 |    3 |
+------------+------+

答案 2 :(得分:3)

您可以做的一件事就是拥有一张包含所有角色的表格,例如:

mysql> select * from chars;
+----+------+
| id | c    |
+----+------+
|  1 | 1    |
|  2 | 2    |
|  3 | 3    |
|  4 | 4    |
+----+------+

如果你的表格如下:

mysql> select * from words;
+----+-----------+
| id | word      |
+----+-----------+
|  1 | 111222333 |
|  2 | 11111111  |
|  3 | 2222111   |
|  4 | 5555555   |
+----+-----------+

然后你可以根据单词在单词中的条件加入这些表,并得到计数,如下所示:

mysql> select word, count(c) from words w inner join chars c on locate(c.c, word) group by word;
+-----------+----------+
| word      | count(c) |
+-----------+----------+
| 11111111  |        1 |
| 111222333 |        3 |
| 2222111   |        2 |
+-----------+----------+

答案 3 :(得分:0)

我认为这不是Mysql的工作, 但如果你努力尝试,你可以做任何事情;)

我不喜欢这个答案,但是它有效,如果你只有数字就不会太难看

SELECT 
    CASE WHEN yourcolumn LIKE '%1%' THEN 1 ELSE 0 END + 
    CASE WHEN yourcolumn LIKE '%2%' THEN 1 ELSE 0 END +
    CASE WHEN yourcolumn LIKE '%3%' THEN 1 ELSE 0 END + 
    CASE WHEN yourcolumn LIKE '%4%' THEN 1 ELSE 0 END +
    CASE WHEN yourcolumn LIKE '%5%' THEN 1 ELSE 0 END +
    CASE WHEN yourcolumn LIKE '%6%' THEN 1 ELSE 0 END +
    CASE WHEN yourcolumn LIKE '%7%' THEN 1 ELSE 0 END +
    CASE WHEN yourcolumn LIKE '%8%' THEN 1 ELSE 0 END +
    CASE WHEN yourcolumn LIKE '%9%' THEN 1 ELSE 0 END +
    CASE WHEN yourcolumn LIKE '%0%' THEN 1 ELSE 0 END
FROM yourtable

答案 4 :(得分:0)

DROP FUNCTION IF EXISTS test.count_chrs;
CREATE DEFINER=`test`@`localhost` FUNCTION `count_chrs`(s CHAR(100)) RETURNS CHAR(4)
  BEGIN 
    DECLARE string_length int(4);
    DECLARE unique_string CHAR(100) DEFAULT "";
    DECLARE count_unique int(4) DEFAULT 0;
    DECLARE current_char int(4) DEFAULT 1;
    SET string_length = CHAR_LENGTH(s);

    WHILE current_char <= string_length DO
      IF (!LOCATE(SUBSTR(s, current_char, 1), unique_string)) THEN
        SET count_unique = count_unique + 1;
        SET unique_string = CONCAT(unique_string, SUBSTR(s, current_char, 1));
      END IF;

      SET current_char = current_char + 1;
    END WHILE;

    RETURN count_unique; 
  END;

我是MySQL函数声明的新手,但这可能会让你以正确的方式。

答案 5 :(得分:0)

有一些级别的子查询可能会推迟一些,并且需要对具有更长字符串的列进行扩展,但是通过使用UNPIVOT将其转向它非常简单。

declare @Data table (RowID nvarchar(1), StringData nvarchar(10))
insert into @Data values (N'1', N'3333333333'),(N'2', N'1113333333'),(N'3', N'1112222444')

select  t1.StringData, cast(t2.CharCount as nvarchar) as 'Unique Characters in String'
from    @Data t1
        inner join (
            select  RowID,count(*) as 'CharCount'
            from    (
                    select  distinct RowID, [char]
                    from    (
                        select  RowID,
                            substring(StringData,1,1) as '1',
                            substring(StringData,2,1) as '2',
                            substring(StringData,3,1) as '3',
                            substring(StringData,4,1) as '4',
                            substring(StringData,5,1) as '5',
                            substring(StringData,6,1) as '6',
                            substring(StringData,7,1) as '7',
                            substring(StringData,8,1) as '8',
                            substring(StringData,9,1) as '9',
                            substring(StringData,10,1) as '10'
                        from    @Data
                        ) Unpivd
                    unpivot ( [char] for chars in ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10])) unpiv
                    where [char] <> ''
                ) CharCounter
            group by RowID
            ) t2
            on t2.RowID = t1.RowID

返回:

StringData  Unique Characters in String
3333333333  1
1113333333  2
1112222444  3