如何从MySql中以逗号分隔值存储的数据中选择不同值的计数?我将使用PHP最终从MySql输出数据。
那里有什么,每个帖子都有标签。所以最后,我正在尝试输出数据,就像stackoverflow对它的标签一样,如下所示:
tag-name x 5
这就是表格中的数据的样子(抱歉内容,但它是食谱的网站)。
"postId" "tags" "category-code"
"1" "pho,pork" "1"
"2" "fried-rice,chicken" "1"
"3" "fried-rice,pork" "1"
"4" "chicken-calzone,chicken" "1"
"5" "fettuccine,chicken" "1"
"6" "spaghetti,chicken" "1"
"7" "spaghetti,chorizo" "1"
"8" "spaghetti,meat-balls" "1"
"9" "miso-soup" "1"
"10" "chanko-nabe" "1"
"11" "chicken-manchurian,chicken,manchurain" "1"
"12" "pork-manchurian,pork,manchurain" "1"
"13" "sweet-and-sour-pork,pork" "1"
"14" "peking-duck,duck" "1"
输出
chicken 5 // occurs 5 time in the data above
pork 4 // occurs 4 time in the data above
spaghetti 3 // an so on
fried-rice 2
manchurian 2
pho 1
chicken-calzone 1
fettuccine 1
chorizo 1
meat-balls 1
miso-soup 1
chanko-nabe 1
chicken-manchurian 1
pork-manchurian 1
sweet-n-sour-pork 1
peking-duck 1
duck 1
我正在尝试select count of all distinct values in there
,但由于它是逗号分隔的数据,似乎无法做到这一点。 select distinct
无效。
你能想到一个很好的方法,无论是使用mysql还是使用php来获取输出,就像我一样?
答案 0 :(得分:13)
我真的不知道如何将逗号分隔值的水平列表转换为行列表,而不创建包含数字的表,因为您可能使用逗号分隔值。如果你可以创建这个表,这是我的答案:
SELECT
SUBSTRING_INDEX(SUBSTRING_INDEX(all_tags, ',', num), ',', -1) AS one_tag,
COUNT(*) AS cnt
FROM (
SELECT
GROUP_CONCAT(tags separator ',') AS all_tags,
LENGTH(GROUP_CONCAT(tags SEPARATOR ',')) - LENGTH(REPLACE(GROUP_CONCAT(tags SEPARATOR ','), ',', '')) + 1 AS count_tags
FROM test
) t
JOIN numbers n
ON n.num <= t.count_tags
GROUP BY one_tag
ORDER BY cnt DESC;
返回:
+---------------------+-----+
| one_tag | cnt |
+---------------------+-----+
| chicken | 5 |
| pork | 4 |
| spaghetti | 3 |
| fried-rice | 2 |
| manchurain | 2 |
| pho | 1 |
| chicken-calzone | 1 |
| fettuccine | 1 |
| chorizo | 1 |
| meat-balls | 1 |
| miso-soup | 1 |
| chanko-nabe | 1 |
| chicken-manchurian | 1 |
| pork-manchurian | 1 |
| sweet-and-sour-pork | 1 |
| peking-duck | 1 |
| duck | 1 |
+---------------------+-----+
17 rows in set (0.01 sec)
让我们构建您的架构:
CREATE TABLE test (
id INT PRIMARY KEY,
tags VARCHAR(255)
);
INSERT INTO test VALUES
("1", "pho,pork"),
("2", "fried-rice,chicken"),
("3", "fried-rice,pork"),
("4", "chicken-calzone,chicken"),
("5", "fettuccine,chicken"),
("6", "spaghetti,chicken"),
("7", "spaghetti,chorizo"),
("8", "spaghetti,meat-balls"),
("9", "miso-soup"),
("10", "chanko-nabe"),
("11", "chicken-manchurian,chicken,manchurain"),
("12", "pork-manchurian,pork,manchurain"),
("13", "sweet-and-sour-pork,pork"),
("14", "peking-duck,duck");
我们将在一行中使用所有代码,因此我们使用GROUP_CONCAT
来完成这项工作:
SELECT GROUP_CONCAT(tags SEPARATOR ',') FROM test;
返回以逗号分隔的所有标记:
PHO,猪肉,炒-米,鸡,油炸米,猪肉,鸡肉,Calzone的,鸡,意大利细面条,鸡,意大利面条,鸡,意大利面条,香肠,意大利面,肉球,味噌汤,相扑火锅,鸡满,鸡,manchurain,猪肉满,猪肉,manchurain,甜和酸味猪肉,猪肉,京鸭,野鸭
要统计所有代码,我们会得到完整代码列表的长度,并且在将,
替换为空格后删除完整代码列表的长度。我们加1,因为分隔符在两个值之间。
SELECT LENGTH(GROUP_CONCAT(tags SEPARATOR ',')) - LENGTH(REPLACE(GROUP_CONCAT(tags SEPARATOR ','), ',', '')) + 1 AS count_tags
FROM test;
返回:
+------------+
| count_tags |
+------------+
| 28 |
+------------+
1 row in set (0.00 sec)
我们使用SUBSTRING_INDEX
函数来获取
-- returns the string until the 2nd delimiter\'s occurrence from left to right: a,b
SELECT SUBSTRING_INDEX('a,b,c', ',', 2);
-- return the string until the 1st delimiter, from right to left: c
SELECT SUBSTRING_INDEX('a,b,c', ',', -1);
-- we need both to get: b (with 2 being the tag number)
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('a,b,c', ',', 2), ',', -1);
有了这样的逻辑,为了在我们的列表中获得第3个标记,我们使用:
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(tags SEPARATOR ','), ',', 3), ',', -1)
FROM test;
返回:
+-------------------------------------------------------------------------------------+
| SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(tags SEPARATOR ','), ',', 3), ',', -1) |
+-------------------------------------------------------------------------------------+
| fried-rice |
+-------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
我的想法有点棘手:
因此,我们将创建一个表,其中包含从1到您列表中可能包含的最大标记数的所有数字。如果您可以拥有1M值,请创建1到1,000,000的1M条目。对于100个标签,这将是:
CREATE TABLE numbers (
num INT PRIMARY KEY
);
INSERT INTO numbers VALUES
( 1 ), ( 2 ), ( 3 ), ( 4 ), ( 5 ), ( 6 ), ( 7 ), ( 8 ), ( 9 ), ( 10 ),
( 11 ), ( 12 ), ( 13 ), ( 14 ), ( 15 ), ( 16 ), ( 17 ), ( 18 ), ( 19 ), ( 20 ),
( 21 ), ( 22 ), ( 23 ), ( 24 ), ( 25 ), ( 26 ), ( 27 ), ( 28 ), ( 29 ), ( 30 ),
( 31 ), ( 32 ), ( 33 ), ( 34 ), ( 35 ), ( 36 ), ( 37 ), ( 38 ), ( 39 ), ( 40 ),
( 41 ), ( 42 ), ( 43 ), ( 44 ), ( 45 ), ( 46 ), ( 47 ), ( 48 ), ( 49 ), ( 50 ),
( 51 ), ( 52 ), ( 53 ), ( 54 ), ( 55 ), ( 56 ), ( 57 ), ( 58 ), ( 59 ), ( 60 ),
( 61 ), ( 62 ), ( 63 ), ( 64 ), ( 65 ), ( 66 ), ( 67 ), ( 68 ), ( 69 ), ( 70 ),
( 71 ), ( 72 ), ( 73 ), ( 74 ), ( 75 ), ( 76 ), ( 77 ), ( 78 ), ( 79 ), ( 80 ),
( 81 ), ( 82 ), ( 83 ), ( 84 ), ( 85 ), ( 86 ), ( 87 ), ( 88 ), ( 89 ), ( 90 ),
( 91 ), ( 92 ), ( 93 ), ( 94 ), ( 95 ), ( 96 ), ( 97 ), ( 98 ), ( 99 ), ( 100 );
现在,我们使用以下查询得到num
{num是number
中的一行):
SELECT n.num, SUBSTRING_INDEX(SUBSTRING_INDEX(all_tags, ',', num), ',', -1) as one_tag
FROM (
SELECT
GROUP_CONCAT(tags SEPARATOR ',') AS all_tags,
LENGTH(GROUP_CONCAT(tags SEPARATOR ',')) - LENGTH(REPLACE(GROUP_CONCAT(tags SEPARATOR ','), ',', '')) + 1 AS count_tags
FROM test
) t
JOIN numbers n
ON n.num <= t.count_tags
返回:
+-----+---------------------+
| num | one_tag |
+-----+---------------------+
| 1 | pho |
| 2 | pork |
| 3 | fried-rice |
| 4 | chicken |
| 5 | fried-rice |
| 6 | pork |
| 7 | chicken-calzone |
| 8 | chicken |
| 9 | fettuccine |
| 10 | chicken |
| 11 | spaghetti |
| 12 | chicken |
| 13 | spaghetti |
| 14 | chorizo |
| 15 | spaghetti |
| 16 | meat-balls |
| 17 | miso-soup |
| 18 | chanko-nabe |
| 19 | chicken-manchurian |
| 20 | chicken |
| 21 | manchurain |
| 22 | pork-manchurian |
| 23 | pork |
| 24 | manchurain |
| 25 | sweet-and-sour-pork |
| 26 | pork |
| 27 | peking-duck |
| 28 | duck |
+-----+---------------------+
28 rows in set (0.01 sec)
只要我们现在有经典行,我们就可以轻松计算每个标记的出现次数。
请参阅top of this answer以查看请求。
答案 1 :(得分:6)
Alain Tiembo有一个很好的答案,解释了很多机制。但是,他的解决方案需要一个临时表(数字)来解决问题。作为后续回答,我将他的所有步骤合并为一个单独的查询(使用tablename
作为原始表格):
SELECT t.tags, count(*) AS occurence FROM
(SELECT
tablename.id,
SUBSTRING_INDEX(SUBSTRING_INDEX(tablename.tags, ',', numbers.n), ',', -1) tags
FROM
(SELECT 1 n UNION ALL SELECT 2
UNION ALL SELECT 3 UNION ALL SELECT 4) numbers INNER JOIN tablename
ON CHAR_LENGTH(tablename.tags)
-CHAR_LENGTH(REPLACE(tablename.tags, ',', ''))>=numbers.n-1
ORDER BY
id, n) t
GROUP BY t.tags
ORDER BY occurence DESC, t.tags ASC
出于演示目的,请参阅 SQLFiddle 。
答案 2 :(得分:2)
首先,您应该使用联结表存储它,每个帖子和标记一行。但是,有时我们无法控制我们正在使用的数据结构。
假设您有一个有效标签列表,您可以执行您想要的操作:
select vt.tag, count(t.postid) as cnt
from validtags vt left join
table t
on find_in_set(vt.tag, t.tags) > 0
group by vt.tag
order by cnt desc;
答案 3 :(得分:1)
建议的方法是不在单个列中存储多个值,而是创建一个交集表。
因此,您的表格将包含以下列:
1.标签:tag_id,名称
2.帖子:post_id,category_code
3. int_tags_to_posts:post_id,tag_id
获取计数:
select t.name, count(*) from tags t, posts p, int_tags_to_posts i where i.post_id = p.post_id and i.tag_id = t.tag_id group by i.tag_id order by count(*) desc;
答案 4 :(得分:0)
这应该有效:
SELECT tag, count(0) count FROM (
SELECT tOut.*, REPLACE(SUBSTRING(SUBSTRING_INDEX(tags, ',', ocur_rank), LENGTH(SUBSTRING_INDEX(tags, ',', ocur_rank - 1)) + 1), ',', '') tag
FROM (
SELECT @num_type := if(@id_check = tY.id, @num_type + 1, 1) AS ocur_rank, @id_check := tY.id as id_check, tY.*
FROM (
SELECT LENGTH(tags) - LENGTH(REPLACE(tags, ',', '')) AS num_ocur, id, tags FROM tablename
) tX
INNER JOIN (SELECT LENGTH(tags) - LENGTH(REPLACE(tags, ',', '')) AS num_ocur, id, tags FROM tablename) tY
INNER JOIN (SELECT @num_type := 0, @id_check := 'some_id') tZ
) tOut
WHERE ocur_rank <= num_ocur + 1
) tempTable GROUP BY tag ORDER BY count DESC;
将“tablename”替换为您的表名。
这个答案来自Jesse Perring在本页上发布的解决方案:
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#c12113