我有一个日志表,其中包含varchar格式的异常和堆栈跟踪数据的列。
我想查询此日志表以获取类似异常的计数。
我如何将相似但不完全匹配的内容聚合在一起?
MyApp.MyCustomException: UserId 1 not found
MyApp.MyCustomException: UserId 2 not found
MyApp.MyCustomException: UserId 3 not found
MyApp.MyCustomException: UserId 1 login failed
MyApp.MyCustomException: UserId 2 login failed
MyApp.MyCustomException: UserId 3 login failed
上述6行应计为
"MyApp.MyCustomException: UserId not found" Count:3
"MyApp.MyCustomException: UserId login failed" Count:3
LEFT函数可用于上述简单示例,但不适用于NullReferenceException等异常,其中错误可能发生在代码中的几个不同位置。
编辑:更新的示例以更清楚地表示问题。
答案 0 :(得分:3)
您可以尝试使用
patindex('%pattern%',column)
整个选择可能类似于
SELECT * FROM tbl
WHERE patindex('%MyApp.MyCustomException: % not found%',err)>0
确保在模式结束之前和之后不要忘记%
。该函数将为您提供在列中找到模式的位置或0
如果未找到。
请点击此处查看示例:http://sqlfiddle.com/#!3/1a70e/1
修改强>
可以像CTE一样完成
WITH msgs AS(
SELECT err,CASE
WHEN patindex('%MyApp.MyCustomException: % not found%',err)>0 THEN 1
WHEN patindex('%Wrong password for %, please try again%',err)>0 THEN 2
ELSE 0 END msgno FROM tbl )
SELECT msgno, MIN(err) msg1, COUNT(*) cnt FROM msgs GROUP BY msgno
见这里:http://sqlfiddle.com/#!3/9565c/2
<强> 2。编辑:强>
或者,以更一般的方式:
WITH pats as (SELECT 'UserId' pat -- define various patterns for
UNION ALL SELECT 'IP' -- words to be removed after ...
), pos1 AS ( -- find position of pattern
SELECT pat,err msg,patindex('%'+pat+'%',err)+len(pat) p1 FROM tbl,pats
), pos2 AS ( -- remove word after pattern
SELECT LEFT(msg,p1)
+'<'+pat+'> '
+SUBSTRING(msg,charindex(' ',SUBSTRING(msg,p1+1,256))+p1,256) msg
FROM pos1 WHERE p1>len(pat)
), nonames AS ( -- find non-specific messages
SELECT err FROM tbl WHERE NOT EXISTS
(SELECT 1 FROM pos1 WHERE msg=err AND p1>len(pat))
)
SELECT msg, count(*) cnt FROM -- combine all, group and count
( SELECT msg FROM pos2 UNION ALL SELECT err FROM nonames ) m
GROUP BY msg
在所有消息中,这将删除在多个预定义模式(pat
)中的一个之后出现的第一个单词(=没有空格的字符序列)。这将使某种类型的消息看起来完全相同,因此可以对它们进行分组。
你可以在这里试试(我的最终解决方案):http://sqlfiddle.com/#!3/a2fb9/4
答案 1 :(得分:3)
我只会将like
与case
:
select trace, count(*)
from (select l.*,
(case when trace like 'MyApp.MyCustomException: UserId % not found'
then 'MyApp.MyCustomException: UserId not found'
when trace like 'MyApp.MyCustomException: UserId % login failed'
then 'MyApp.MyCustomException: UserId login failed'
else trace
end) as canonical_tracer
from log l
) l
group by trace;
答案 2 :(得分:0)
这可能看起来很难看,但应该相对有效。我在分组之前使用replace来摆脱数字和额外的空格。看看:
WITH yourTable
AS
(
SELECT *
FROM
(
VALUES ('MyApp.MyCustomException: UserId 1 not found'),
('MyApp.MyCustomException: UserId 2 not found'),
('MyApp.MyCustomException: UserId 3 not found'),
('MyApp.MyCustomException: UserId 1 login failed'),
('MyApp.MyCustomException: UserId 2 login failed'),
('MyApp.MyCustomException: UserId 3 login failed')
) A(col)
)
SELECT generic_col,
COUNT(*) AS cnt,
'Count: ' + CAST(COUNT(*) AS VARCHAR(25)) AS formatted_cnt
FROM yourTable
CROSS APPLY (SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(col,'1',''),'2',''),'3',''),'4',''),'5',''),'6',''),'7',''),'8',''),'9',''),'0',''),' ',' ')) AS CA(generic_col)
GROUP BY generic_col