如何在mysql查询中使用字典表检测字符串“appletopic”的两个单词

时间:2016-03-12 23:46:18

标签: mysql postgresql

我有一个像这样的字符串表:

id代码
1 appletopic
2 hellomore
3 maybebasic
每行由2个单词的串联组成

和字典表

id名称频率
1个苹果300
2你600
3主题23
4你好234

我必须在mysql中生成结果表

id |代码| firstWord | SecondWord
1 | appletopic |苹果|话题
2 | hellomore |你好|更
3 |也许是基本的也许|基本

如果可能有2个单词组合,则选择频率最高的单词。

如何在mysql中执行此操作?

2 个答案:

答案 0 :(得分:1)

以下是使用LEFT JOIN的解决方案:

-- SQL wanted
SELECT 
    s.id, s.code, 
    d.name FirstWord, SUBSTRING_INDEX(s.code, d.name, -1) SecondWord
FROM string s LEFT JOIN dict d ON s.code LIKE CONCAT(d.name, '%');

以下是完整演示。

SQL:

-- data
create table string(id int, code char(100));
insert into string values
(1, 'appletopic'),
(2, 'hellomore'),
(3, 'maybebasic');
create table dict(id int, name char(100), frequency int);
insert into dict values
(1, 'apple', 300 ),
(2, 'you', 600 ),
(3, 'topic', 23),
(4, 'hello', 234);
SELECT * FROM string;
SELECT * FROM dict;

-- SQL wanted
SELECT 
    s.id, s.code, 
    d.name FirstWord, SUBSTRING_INDEX(s.code, d.name, -1) SecondWord
FROM string s LEFT JOIN dict d ON s.code LIKE CONCAT(d.name, '%');

输出:

mysql> SELECT * FROM dict;
+------+-------+-----------+
| id   | name  | frequency |
+------+-------+-----------+
|    1 | apple |       300 |
|    2 | you   |       600 |
|    3 | topic |        23 |
|    4 | hello |       234 |
+------+-------+-----------+
4 rows in set (0.00 sec)

mysql> SELECT
    -> s.id, s.code,
    -> d.name FirstWord, SUBSTRING_INDEX(s.code, d.name, -1) SecondWord
    -> FROM string s LEFT JOIN dict d ON s.code LIKE CONCAT(d.name, '%');
+------+------------+-----------+------------+
| id   | code       | FirstWord | SecondWord |
+------+------------+-----------+------------+
|    1 | appletopic | apple     | topic      |
|    2 | hellomore  | hello     | more       |
|    3 | maybebasic | NULL      | NULL       |
+------+------------+-----------+------------+
3 rows in set (0.00 sec)

答案 1 :(得分:0)

据推测,您有两个表格结构如下:

CREATE TABLE `codes` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `code` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1

CREATE TABLE `freqs` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  `frequency` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1

我们有一些这样的数据行:

+----+------------+
| id | code       |
+----+------------+
|  1 | appletopic |
|  2 | hellomore  |
|  3 | maybebasic |
+----+------------+
+----+-------+-----------+
| id | name  | frequency |
+----+-------+-----------+
|  1 | apple |       300 |
|  2 | you   |       600 |
|  3 | topic |        23 |
|  4 | hello |       234 |
+----+-------+-----------+

您可以使用以下查询实现可能的输出。基本上你必须加入两个表并检查codes表中的子字符串是否与freqs表中的字符串匹配。注意MySQL SUBSTRING从1开始。

SELECT codes.id, codes.code, t1.name, t2.name FROM codes 
JOIN freqs AS t1 ON 
     SUBSTRING(codes.code, 1, CHAR_LENGTH(t1.name)) = t1.name
JOIN freqs AS t2 ON 
     SUBSTRING(codes.code, CHAR_LENGTH(t2.name)+1, CHAR_LENGTH(codes.code)) = t2.name;

最终结果:

+----+------------+-------+-------+
| id | code       | name  | name  |
+----+------------+-------+-------+
|  1 | appletopic | apple | topic |
|  2 | hellomore  | hello | NULL  |
|  3 | maybebasic | NULL  | NULL  |
+----+------------+-------+-------+