复杂的SQL编写

时间:2012-06-06 11:40:19

标签: sql sql-server-ce sql-server-ce-4

我有这张桌子:

table session(
ID number,
SessionID VarChar,
Date,
Filter
)

此表包含搜索信息,如下所示:

ID  SessionID                   Date                filter
4   peqq421gaspts3nuulq5mwcq    24/05/2012 13:48    meagPixel=5
6   peqq421gaspts3nuulq5mwcq    24/05/2012 13:48    brand=Canon
7   peqq421gaspts3nuulq5mwcq    24/05/2012 13:48    brand=Canon&meagPixel=12.1
8   peqq421gaspts3nuulq5mwcq    24/05/2012 13:48    brand=Canon
10  peqq421gaspts3nuulq5mwcq    24/05/2012 13:48    brand=Nikon
12  peqq421gaspts3nuulq5mwcq    24/05/2012 13:48    meagPixel=12.1
13  peqq421gaspts3nuulq5mwcq    24/05/2012 13:48    meagPixel=12.1&opticalZoom=True
14  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    meagPixel=12.1&opticalZoom=True&brand=Panasonic
16  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    price=500.00
18  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    price=499.00
19  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    price=499.00&brand=Olympus
21  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    zoomRange=2000
22  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    zoomRange=2000&brand=Leica
23  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    zoomRange=2000&brand=Leica&price=1995.00
24  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True
25  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2
26  peqq421gaspts3nuulq5mwcq    24/05/2012 13:50    zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2&weight=345
27  peqq421gaspts3nuulq5mwcq    24/05/2012 13:58    zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2
41  poiq41111spts00000q5aaaa    27/05/2012 13:48    meagPixel=5

我想获得独特的搜索。唯一的搜索是:

  • 用户(会话)的最长搜索(过滤器)
  • 如果第一个过滤器有变化 - 需要将其视为新搜索(过滤器)

由于ASP.NET不保证SessionID是唯一的(SessionID,Date)是唯一的。

我没有太远:

SELECT        MAX(Filter)
FROM            Session
GROUP BY SessionID

BTW我给出的示例表数据的结果应该返回:

ID  SessionID                   Date                filter              
4   peqq421gaspts3nuulq5mwcq    24/05/2012 13:48    meagPixel=5     
7   peqq421gaspts3nuulq5mwcq    24/05/2012 13:48    brand=Canon&meagPixel=12.1      
10  peqq421gaspts3nuulq5mwcq    24/05/2012 13:48    brand=Nikon     
14  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    meagPixel=12.1&opticalZoom=True&brand=Panasonic     
16  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    price=500.00        
19  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    price=499.00&brand=Olympus      
26  peqq421gaspts3nuulq5mwcq    24/05/2012 13:50    zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2&weight=345     
41  poiq41111spts00000q5aaaa    27/05/2012 13:48    meagPixel=5     

感谢您的帮助和指导。

3 个答案:

答案 0 :(得分:1)

@GarethD - 架构和插入查询的Tx。 我尝试过略有不同的方法。我不确定这是否适用于所有情况。它适用于mysql和mssql。

          select * 
          from tsession t1 
          where  not exists (
                             select * 
                             from tsession t2 
                             where t2.filter  like concat(t1.filter,'%') 
                             and t1.filter<>t2.filter 
                             and t1.sessionid=t2.sessionid) 
          order by id;

根据问题的要求提供准确的结果。

答案 1 :(得分:0)

要获得最长的搜索过滤器,您需要执行以下操作:

select s.*
from (select s.*,
             row_number() over (partition by sessionid order by len desc) as rownum
      from (select s.*, len(filter) as len
            from session s
           ) s
     ) s
where rownum = 1

我正在使用Windows函数执行此操作。您可以使用聚合和连接来执行相同的操作。

但是,您说会话不是真正的标识符。会话/过滤器是。以下查询几乎可以获得您想要的内容:

select s.*
from (select s.*,
             row_number() overo over (partition by sessionid, filter 
                                      order by len desc) as rownum
      from (select s.*, len(filter) as len
            from session s
           ) s
     ) s
where rownum = 1

(唯一的变化是分区子句包含过滤器。)

您可能有重复项。如果你想要所有重复项,那么稍微不同的查询就可以了。

答案 2 :(得分:0)

首先,您的示例数据中存在错误我认为第25,26和27行应该都出现在您的最终数据中。 27肯定应该是因为它是会话ID和日期组合的唯一条目。

假设上述情况正确,那么我认为我已经正确地建立了你的逻辑。

步骤1是为每个过滤器定义第一个搜索词,以及在会话中出现的顺序:

;WITH CTE AS
(   SELECT  *, 
            SUBSTRING(Filter, 1, CASE WHEN CHARINDEX('&', Filter) = 0 THEN LEN(Filter) ELSE CHARINDEX('&', Filter) - 1 END) [FirstTerm],
    FROM    Session
)

下一步是确定每次搜索是新搜索还是前一次搜索的延续。这是通过在会话中获取Previous搜索词(为什么SessionOrder在最后一个CTE中定义)并确定第一个搜索词是否相同来完成的。

, CTE2 AS
(   SELECT  T1.*, 
            CASE WHEN T1.SessionOrder = 1 OR T2.SessionOrder IS NOT NULL THEN 1 ELSE 0 END [NewSearch]
    FROM    CTE T1
            LEFT JOIN CTE T2
                ON  T1.SessionID = T2.SessionID
                AND T1.Date = T2.Date
                AND T1.FirstTerm != T2.FirstTerm
                AND T1.SessionOrder = T2.SessionOrder + 1
)

接下来,每个新搜索都需要在会话中拥有自己的排名,用于分组purpuses。然后,您定义了规则(SessionID,Date和First Search术语的唯一组合),然后您可以根据过滤器的长度对唯一组合中的每个项目进行排序:

, CTE3 AS
(   SELECT  *,
            ROW_NUMBER() OVER(PARTITION BY SessionID, Date, ISNULL(SearchNumber, 0) ORDER BY LEN(Filter) DESC) [SearchOrder]
    FROM    CTE2 T1
            OUTER APPLY
            (   SELECT  SUM(NewSearch) [SearchNumber]
                FROM    CTE2 T2
                WHERE   T1.SessionOrder >= T2.SessionOrder
                AND     T1.SessionID = T2.SessionID
                AND     T1.Date = T2.Date
            ) c
)

最后,您需要做的就是将结果限制为SessionID,日期和第一个过滤条件的每个组合的最长搜索词:

SELECT  ID, SessionID, Date, Filter
FROM    CTE3
WHERE   SearchOrder = 1
ORDER BY ID

通常情况下,我会把这些全部放在SQLFiddle上,而不是在这里发布一个完整的工作示例,但它今天似乎没有工作。所以这是我用来测试你的数据的完整SQL:

CREATE TABLE #Session (ID INT, SessionID VARCHAR(50), Date DATETIME, Filter VARCHAR(200))
INSERT INTO #Session VALUES
    (2, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'brand=Canon'),
    (4, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'meagPixel=5'),
    (6, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'brand=Canon'),
    (7, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'brand=Canon&meagPixel=12.1'),
    (8, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'brand=Canon'),
    (10, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'brand=Nikon'),
    (12, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'meagPixel=12.1'),
    (13, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'meagPixel=12.1&opticalZoom=True'),
    (14, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'meagPixel=12.1&opticalZoom=True&brand=Panasonic'),
    (16, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'price=500.00'),
    (18, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'price=499.00'),
    (19, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'price=499.00&brand=Olympus'),
    (21, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'zoomRange=2000'),
    (22, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'zoomRange=2000&brand=Leica'),
    (23, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'zoomRange=2000&brand=Leica&price=1995.00'),
    (24, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True'),
    (25, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2'),
    (26, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:50', 'zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2&weight=345'),
    (27, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:58', 'zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2'),
    (41, 'poiq41111spts00000q5aaaa', '27/05/2012 13:48', 'meagPixel=5')

;WITH CTE AS
(   SELECT  *, 
            SUBSTRING(Filter, 1, CASE WHEN CHARINDEX('&', Filter) = 0 THEN LEN(Filter) ELSE CHARINDEX('&', Filter) - 1 END) [FirstTerm],
    FROM    #Session
), CTE2 AS
(   SELECT  T1.*, 
            CASE WHEN T1.SessionOrder = 1 OR T2.SessionOrder IS NOT NULL THEN 1 ELSE 0 END [NewSearch]
    FROM    CTE T1
            LEFT JOIN CTE T2
                ON  T1.SessionID = T2.SessionID
                AND T1.Date = T2.Date
                AND T1.FirstTerm != T2.FirstTerm
                AND T1.SessionOrder = T2.SessionOrder + 1
), CTE3 AS
(   SELECT  *,
            ROW_NUMBER() OVER(PARTITION BY SessionID, Date, ISNULL(SearchNumber, 0) ORDER BY LEN(Filter) DESC) [SearchOrder]
    FROM    CTE2 T1
            OUTER APPLY
            (   SELECT  SUM(NewSearch) [SearchNumber]
                FROM    CTE2 T2
                WHERE   T1.SessionOrder >= T2.SessionOrder
                AND     T1.SessionID = T2.SessionID
                AND     T1.Date = T2.Date
            ) c
)
SELECT  ID, SessionID, Date, Filter
FROM    CTE3
WHERE   SearchOrder = 1
ORDER BY ID

DROP TABLE #Session

<强>附录

好的,根据您的结果集,您实际上并不想按日期列进行分组,您只需按第一个搜索词和sessionID分组的长度顺序排列行。

此查询产生与样本数据相同的结果。我在2008 R1中测试了这个,但是没有理由认为它在SQL-Server CE中不起作用。

;WITH CTE AS
(   SELECT  *,
            ROW_NUMBER() OVER(PARTITION BY SessionID, SUBSTRING(Filter, 1, CASE WHEN CHARINDEX('&', Filter) = 0 THEN LEN(Filter) ELSE CHARINDEX('&', Filter) - 1 END) ORDER BY LEN(Filter) DESC) [RowNumber]
    FROM    Session
)
SELECT  *
FROM    CTE
WHERE   RowNumber = 1
ORDER BY ID

SQL Fiddle最终解决方案