使用SQL查询进行多标签分组

时间:2017-07-26 04:00:47

标签: sql multilabel-classification

我希望能够浏览多标记条目列表,在那里我可以找到常见标签组的数据。 我有一个表格结构如下:

---------------------------------------------------------------------
|GameID |   Title   |Label1 |Label2 |Label3 |Label4 |...    |LabelN |
|-------|-----------|-------|-------|-------|-------|-------|-------|
|1      |   A       |   1   |   0   |   1   |   0   |   ... |   1   |   
|2      |   B       |   1   |   1   |   0   |   1   |   ... |   0   |
|3      |   C       |   0   |   0   |   1   |   1   |   ... |   0   |
|4      |   D       |   1   |   0   |   0   |   0   |   ... |   1   |
|...    |   ...     |   ... |   ... |   ... |   ... |   ... |   ... |
---------------------------------------------------------------------

如果条目在标签下有一个条目,则表示该条目与该标签相关联。否则,它与该标签无关。例如,游戏A具有标签" Label1"," Label3",...和" LabelN"与之相关联。

现在,以此SQL查询为例:

SELECT GameID, Title
FROM GameTagsBinary
WHERE "Gun Customization" = 1 AND "Zombies" = 1

此查询将返回以下表格:

-------------------------------------
|GameID |   Title                   |
|-------|---------------------------|
|263070 |   Blockstorm              |
|209870 |   Blacklight: Retribution |
|436520 |   Line of Sight           |
-------------------------------------

我想要的是查询将遍历每一列,Label1到LabelN,并打印出与这些标签相关的游戏数量。

-------------------------------------------------
|Combination                |   NumberOfGames   |
|---------------------------|-------------------|   
|Label1 + Label2            |   5               |
|Label1 + Label3            |   11              |
|Label1 + Label4            |   9               |
|...                        |   ...             |
|Gun Customization + Zombies|   3               |
|...                        |   ...             |
|LabelN + Label(N-1)        |   7               |
-------------------------------------------------

2 个答案:

答案 0 :(得分:1)

尝试以下逻辑(将YOUR_TABLE_NAME替换为您的实际表格名称)

您可能需要创建一个存储过程来运行SQL脚本以获取结果。

Btw,代码是针对SQL服务器的,如果使用其他数据库,一些代码会有所不同。

代码查找不同的列组合

select result.* into #temp from (
select ROW_NUMBER() over (order by a.name) as id, a.name as a_name, b.name as b_name from 
(select name from sys.columns where object_id=object_id('YOUR_TABLE_NAME')) a
 cross join
(select name from sys.columns where object_id=object_id('YOUR_TABLE_NAME')) b
where a.name <> b.name ) result

select a_name, b_name into #combination from #temp temp1 where not exists(select 1 from #temp temp2 where temp1.a_name = temp2.b_name and temp1.b_name = temp2.a_name and temp1.id > temp2.id)

光标循环组合并插入另一个临时表

create table #result
(
    Combination varchar(100),
    NumberOfGames int
)

DECLARE @a_name varchar(100);
DECLARE @b_name varchar(100);
DECLARE @combination_string varchar(100);
DECLARE @count int;
DECLARE @count_sql varchar(1000);
DECLARE combo_cursor CURSOR FOR select a_name, b_name FROM #combination
OPEN combo_cursor
FETCH NEXT FROM combo_cursor INTO @a_name, @b_name;

WHILE @@FETCH_STATUS = 0 
BEGIN
    set @combination_string = @a_name + ' + ' + @b_name

    set @sql = 'select @count = count(*) from YOUR_TABLE_NAME where ' + @a_name + ' = 1 and ' + @b_name + ' = 1'

    EXEC SP_EXECUTESQL @sql

    insert into #result (Combination, NumberOfGames)
    values (@combination_string, @count)

    FETCH NEXT FROM combo_cursor INTO @a_name, @b_name;
END

CLOSE combo_cursor;
DEALLOCATE combo_cursor;

您的最终结果

select * from #result

执行后删除临时表

drop table #temp
drop table #combination
drop table #result

答案 1 :(得分:0)

您是否尝试在select语句中使用count()参数?

即。

Select count(*) From GameTagsBinary Where Label1 = "Gun Customization" AND Label2 = "Zombies"

但是我不确定如何一次性为所有标签执行此操作,我假设可以在一个包含大量连接的大量查询中执行此操作。对不起,我无法提供更多帮助,但这应该可以帮助一点。