Question

我有一个包含来自门户网站的日志的表，它包含访问过的网址，请求持续时间，参考资料...

其中一列是路径信息，其中包含以下字符串：

/admin/
/export/
/project2/
/project1/news
/project1/users
/user/id/1
/user/id/1/history
/user/id/2
/forum/topic/14/post/456

我想用sql查询根据此列计算一些统计数据，所以我想知道如何根据路径信息的第一部分创建聚合？

让我计算以/admin/，/export/，/project1/，/project2/，/user/，/forum/开头的网址数量， ...

使用正则表达式使用编程语言很容易，但我读到SQLServer上不存在类似的函数。

Answer 1

我会使用CHARINDEX（）来查找在前导的第一个字符'/'之后开始的第一次出现的“/”，所以在第二个字符之后的任何内容都会被删除。

  select
          LEFT( pathInfo, CHARINDEX( '/', pathInfo, 2 )) as RootLevelPath,
          count(*) as Hits
       from
          temp
       group by
          LEFT( pathInfo, CHARINDEX( '/', pathInfo, 2 ))

Working result from SQLFiddle

Answer 2

DRapp非常适合对URL的第一个片段进行分组。如果需要按其他级别进行分组，则可能难以管理嵌套的LEFT / CHARINDEX语句。

以下是按参数化级别分组的一种方法：

declare @t table (pathId int identity(1,1) primary key, somePath varchar(100));
insert into @t
    select '/admin/' union all
    select '/export/' union all
    select '/project2/' union all
    select '/project1/news' union all
    select '/project1/users' union all
    select '/user/id/1' union all
    select '/user/id/1/history' union all
    select '/user/id/2' union all
    select '/forum/topic/14/post/456' union all
    select '/forum/topic/14/post/789' union all
    select '/forum/topic/14/post/789'


declare @level int =1;

;with fragments as
(   select  pathId,
            [n] = x.query('.'),
            [Fragment] = x.value('.', 'varchar(100)')
    from    (   select  PathId, 
                        cast('<r>' + replace(stuff(somePath, 1, 1, ''), '/', '</r><r>') + '</r>' as xml)
                            .query('r[position()<=sql:variable("@level")]')
                from @t
            ) d (PathId, X)
)
select  count(*), [path] = max(r.v)
from    fragments f
cross
apply   (   select  '/' + p.n.value('.', 'varchar(100)')
            from    fragments
            cross
            apply   n.nodes('r')p(n)
            where   PathId = f.PathId
            for xml path('')
        ) r(v)
group
by      fragment;

如何分组字符串部分？

2 个答案: