T-SQL分组信息集

时间:2011-11-21 20:19:30

标签: database tsql group-by analytics

我有一个问题,我有限的SQL知识让我无法理解。

首先是问题:

我有一个我需要运行报告的数据库,它包含用户权利的配置。报告需要显示这些配置的不同列表以及针对每个配置的计数。

所以我的数据库中的一行看起来像这样:

USER_ID SALE_ITEM_ID    SALE_ITEM_NAME  PRODUCT_NAME    CURRENT_LINK_NUM    PRICE_SHEET_ID
37715     547             CultFREE    CultPlus         0                561 

上面的行是用户配置的一行,对于每个用户ID,这些行中可以有1-5行。所以配置的定义是多行数据共享一个具有可变属性的公共用户ID ..

我需要在整个表格中获得这些配置的明确列表,只为每个实例提供一个配置集,其中> 1具有该配置和该配置的实例计数。

希望这很清楚?

有什么想法吗?!?!

我尝试过各种各样的小组和工会,但是分组设置功能也无济于事。

如果有人能给我一些指示,那将是非常好的!

2 个答案:

答案 0 :(得分:0)

SELECT
USER_ID,
SALE_ITEM_ID, ETC...,
COUNT(*) WhateverYouWantToNameCount

FROM TableNAme
GROUP BY USER_ID

答案 1 :(得分:0)

哎哟伤害......

好的问题:

  1. 一行表示可配置的行
  2. 用户可能会链接到超过1行的配置
  3. 组合在一起时的配置行形成配置集
  4. 我们想弄清楚所有不同的配置集
  5. 我们想知道用户在使用它们。
  6. 解决方案(它有点乱,但想法就在那里,复制并粘贴到SQL管理工作室)......

    -- ok so i imported the data to a table named SampleData ... 
    -- 1. import the data 
    -- 2. add a new column
    -- 3. select all the values of the config in to the new column (Configuration_id)
    --UPDATE [dbo].[SampleData]
    --SET [Configuration_ID] = SALE_ITEM_ID + SALE_ITEM_NAME + [PRODUCT_NAME] + [CURRENT_LINK_NUM] + [PRICE_SHEET_ID] + [Configuration_ID]
    
    -- 4. i then selected just the distinct values of those and found 6 distinct Configuration_id's 
    --SELECT DISTINCT [Configuration_ID] FROM [dbo].[SampleData]
    
    -- 5. to make them a bit easier to read and work with i gave them int values instead 
    --    for me it was easy to do this manually but you might wanna do some trickery here to autonumber them or something 
    --    basic idea is to run the step 4 statement but select into a new table then add a new primary key column and set identity spec on it
    --    that will generate u a bunch of incremental numbers for your config id's so u can then do something like ...
    --UPDATE [dbo].[SampleData] sd
    --SET Configuration_ID = (SELECT ID FROM TempConfigTable WHERE Config_ID = sd.Configuration_ID)
    
    -- at this point you have all your existing rows with a unique ident for the values combined in each row.
    -- so for example in my dataset i have several rows where only the user_id has changed but all look like this ...
    --SALE_ITEM_ID  SALE_ITEM_NAME  PRODUCT_NAME    CURRENT_LINK_NUM    PRICE_SHEET_ID  Configuration_ID
    --54101 TravelFREE  TravelPlus  0   56101   1
    
    -- now you have a config id you can start to work on building sets up ...
    -- each user is now matched with 1 or more config id 
    -- 6. we use a CTE (common table expression) to link the possibles (keeps the join small) ...
    --WITH Temp (ConfigID)
    --AS
    --(
    --  SELECT DISTINCT SD.Configuration_Id --SD2.Configuration_Id, SD3.Configuration_Id, SD4.Configuration_Id, SD5.Configuration_Id, 
    --  FROM [dbo].[SampleData] SD
    --)
    -- this extracts all the possible combinations using the CTE
    -- on the basis of what you told me, max rows per user is 6, in the result set i have i only have 5 distinct configs
    -- meaning i gain nothing by doing a 6th join.
    -- cross joins basically give you every combination of unique values from the 2 tables but we joined back on the same table 
    -- so its every possible combination of Temp + Temp (ConfigID + ConfigID) ... per cross join so with 5 joins its every combination of 
    -- Temp + Temp + Temp + Temp + Temp .. good job temp only has 1 column with 5 values in it
    -- 7. uncomment both this and the CTE above ... need to use them together
    --SELECT DISTINCT T.ConfigID C1, T2.ConfigID C2, T3.ConfigID C3, T4.ConfigID C4, T5.ConfigID C5
    --INTO [SETS]
    --FROM Temp T
    --CROSS JOIN Temp T2
    --CROSS JOIN Temp T3
    --CROSS JOIN Temp T4
    --CROSS JOIN Temp T5
    
    -- notice the INTO clause ... this dumps me out a new [SETS] table in my db
    -- if i go add a primary key to this and set its ident spec i now have unique set id's 
    -- for each row in the table.
    --SELECT *
    --FROM [dbo].[SETS]
    
    -- now here's where it gets interesting ... row 1 defines a set as being config id 1 and nothing else 
    -- row 2 defines set 2 as being config 1 and config 2 and nothing else ... and so on ...
    -- the problem here of course is that 1,2,1,1,1 is technically the same set as 1,1,1,2,1 from our point of view
    -- ok lets assign a set to each userid ...
    -- 8. first we pull the distinct id's out ...
    --SELECT DISTINCT USER_ID usr, null SetID 
    --INTO UserSets
    --FROM SampleData 
    
    -- now we need to do bit a of operating on these that's a bit much for a single update or select so ...
    -- 9. process findings in a loop
    DECLARE @currentUser int
    DECLARE @set int
    -- while theres a userid not linked to a set
    WHILE EXISTS(@currentUser = SELECT TOP 1 usr FROM UserSets WHERE SetId IS NULL)
    BEGIN
        -- figure out a set to link it to
        SET @set = (
            SELECT TOP 1 ID 
            FROM [SETS]
            -- shouldn't really do this ... basically need to refactor in to a table variable then compare to that
            -- that way the table lookup on ur main data is only 1 per User_id
            WHERE C1 IN (SELECT DISTINCT Configuration_id FROM SampleData WHERE USER_ID = @currentUser)
            AND C2 IN (SELECT DISTINCT Configuration_id FROM SampleData WHERE USER_ID = @currentUser)
            AND C3 IN (SELECT DISTINCT Configuration_id FROM SampleData WHERE USER_ID = @currentUser)
            AND C4 IN (SELECT DISTINCT Configuration_id FROM SampleData WHERE USER_ID = @currentUser)
            AND C5 IN (SELECT DISTINCT Configuration_id FROM SampleData WHERE USER_ID = @currentUser)
        )
        -- hopefully that worked 
        IF(@set IS NOT NULL)
        BEGIN 
            -- tell the usersets table 
            UPDATE UserSets SET SetId = @set WHERE usr = @currentUser
            set @set = null
        END
        ELSE -- something went wrong ... set to 0 to prevent endless loop but any userid linked to set 0 is a problem u need to look at
            UPDATE UserSets SET SetId = 0 WHERE usr = @currentUser
        -- and round we go again ... until we are done
    END