MySQL count regexp匹配列中的每个唯一值

时间:2018-05-12 16:31:29

标签: mysql

在我的艺术家,标题列的表格中,我想搜索所有独特的艺术家出现物品并按艺术家计算行数(行可以重复)。关键是歌曲是由二人组甚至更大的团体进行的,我无法处理这种情况。以表的一小部分为例:

artist                                  title
Beyoncé                                 Halo
Lady GaGa                               Paparazzi
Lady GaGa                               Poker Face
Lady GaGa                               Poker Face
Lady GaGa & Beyoncé                     Telephone
Rihanna                                 Disturbia
Rihanna & Kanye West & Paul McCartney   FourFiveSeconds

我目前的查询是:

SELECT artist, COUNT(*) AS total
FROM (
    SELECT artist
    FROM list
    GROUP BY artist, title
) AS a
GROUP BY artist
ORDER BY total DESC

结果:

artist      total
Lady GaGa   2
Beyoncé     1
Rihanna     1

我想要的结果分别是3,2和2,因为Beyoncé和GaGa都有一首尚未计算的二重唱歌曲和蕾哈娜有三首歌曲。 Kanye West和Paul McCartney在这个例子中没有独唱歌曲,所以我不希望它们被计算在内。我应该以某种方式使用REGEXP运算符...任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:1)

A设法根据示例中的小数据集拼凑了适合您示例的SQL。

正如我在之前的评论中提到的,存储过程将艺术家字符串拆分并插入到临时表中。从那里它是一个简单的计数和组查询,就像你上面一样。

创建函数和过程后,您应该能够执行存储过程 listArtistCounts 以获得结果

我的SO资源列在代码上方。

SPLIT_STR功能

source

delimiter $$

CREATE FUNCTION SPLIT_STR(
  x VARCHAR(255),
  delim VARCHAR(12),
  pos INT
)
RETURNS VARCHAR(255)
RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(x, delim, pos),
       LENGTH(SUBSTRING_INDEX(x, delim, pos -1)) + 1),
       delim, '');
$$

AddSplitRecords程序

delimiter $$
CREATE PROCEDURE AddSplitRecords(fullstr)

   BEGIN
      DECLARE a INT Default 0 ;
      DECLARE str VARCHAR(255);
      simple_loop: LOOP
         SET a=a+1;
         -- Split based on " & "
         SET str=SPLIT_STR(fullstr," & ",a);
         IF str='' THEN
            LEAVE simple_loop;
         END IF;

         #Do Inserts into temp table here with str going into the row
         insert into temp_table values (str);
   END LOOP simple_loop;
END $$

listArtistCounts程序

source

create procedure listArtistCounts()
begin
-- Variable to hold artist field from query
declare cArtistList varchar(255);

-- Variables related to cursor:
--    1. 'done' will be used to check if all the rows in the cursor 
--       have been read
--    2. 'curArtist' will be the cursor: it will fetch each row
--    3. The 'continue' handler will update the 'done' variable

declare done int default false;
declare curArtist cursor for
    SELECT artist FROM list GROUP BY artist, title; -- This is the query used by the cursor.

declare continue handler for not found -- This handler will be executed if no row is found in the cursor (for example, if all rows have been read).
    set done = true;

CREATE TEMPORARY TABLE IF NOT EXISTS temp_table(artist varchar(255) NOT NULL);

-- Open the cursor: This will put the cursor on the first row of its rowset.
open curArtist;

-- Begin the loop (that 'loop_artist' is a label for the loop)
loop_artist: loop
    -- When you fetch a row from the cursor, the data from the current
    -- row is read into the variables, and the cursor advances to the
    -- next row. 
    -- If there's no next row, the 'continue handler for not found'
    -- will set the 'done' variable to 'TRUE'
    fetch curArtist into cArtistList;

    -- Exit the loop if you're done
    if done then
        leave loop_artist;
    end if;

    -- Execute insert from row.
    AddSplitRecords cArtistList

end loop;

-- Close the cursor when done
close curArtist;

-- Output results of distinct artists counted
SELECT artist,  COUNT(*) AS total FROM temp_table GROUP BY artist ORDER BY total DESC

end $$
delimiter ;

答案 1 :(得分:0)

如果您没有表格列表的正式架构,那么您的预期结果似乎很难找到......

您可以尝试内部联接,例如获取(至少)类似artis在不同行上的结果

  select a.artist, t.my_num 
  from list 
  inner join (
  select artist, count(distinct title) my_num 
  from list 
  group by artist ) t on t.artist like concat('%', a.artist, '%')