(我想根据具有多个值的字符串列中的不同值进行分组依据
所述列具有标准格式的字符串列表,以逗号分隔。潜在值仅为a,b,c,d
。
例如,列collection
(类型:字符串)包含:
Row 1: ["a","b"]
Row 2: ["b","c"]
Row 3: ["b","c","a"]
Row 4: ["d"]`
预期输出是唯一值的计数:
collection | count
a | 2
b | 3
c | 2
d | 1
答案 0 :(得分:1)
对于以下所有内容,我都使用此表:
create table tmp (
id INT auto_increment,
test VARCHAR(255),
PRIMARY KEY (id)
);
insert into tmp (test) values
("a,b"),
("b,c"),
("b,c,a"),
("d")
;
如果可能的值仅为a,b,c,d
,则可以尝试以下方法之一:
请注意,只有在您没有类似test
和test_new
这样的值时,这才行得通,因为这样test
也会与所有test_new
行和计数一起加入不匹配
select collection, COUNT(*) as count from tmp JOIN (
select CONCAT("%", tb.collection, "%") as like_collection, collection from (
select "a" COLLATE utf8_general_ci as collection
union select "b" COLLATE utf8_general_ci as collection
union select "c" COLLATE utf8_general_ci as collection
union select "d" COLLATE utf8_general_ci as collection
) tb
) tb1
ON tmp.test LIKE tb1.like_collection
GROUP BY tb1.collection;
哪个会给您想要的结果
collection | count
a | 2
b | 3
c | 2
d | 1
或者您可以尝试这个
SELECT
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%a%') as a_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%b%') as b_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%c%') as c_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%d%') as d_count
;
结果将是这样
a_count | b_count | c_count | d_count
2 | 3 | 2 | 1
答案 1 :(得分:1)
您需要做的是首先将集合列分解到单独的行中(例如flatMap
操作)。在红移中,生成新行的唯一方法是到JOIN
-因此,让我们将CROSS JOIN
的输入表与具有连续数字的静态表一起使用,而仅将具有{{1 }}小于或等于集合中元素的数量。然后,我们将使用id
函数以正确的索引读取项目。拥有已加载的表后,我们将做一个简单的split_part
。
如果您的商品存储为JSON数组字符串(GROUP BY
,则可以分别使用'["a", "b", "c"]'
和JSON_ARRAY_LENGTH
代替JSON_EXTRACT_ARRAY_ELEMENT_TEXT
和REGEXP_COUNT
。 / p>
SPLIT_PART