如何在Stream Analytics Group By中将多个记录与字符串和空值合并

时间:2017-05-24 17:49:13

标签: sql-server azure-application-insights azure-stream-analytics

我正在尝试将一些记录的事件从Application Insights提取到我们的SQL数据库中。我无法控制输入的格式,这些输入是由文件中的多个json数组组成的json文件。在每个记录中,5个信息位于[context]中的json数组中。[custom]。[dimension]在文件中并使用OUTER APPLY展平这些值。问题是它返回的结果不是每条记录一行,而是好像你已经连接了一行5(这确实是它已经完成的),并且5个数据的值在4种情况下是NULL,而实际值是另一个。我只需要5个值中的2个 - PageType和UserId - 并且在我的GROUP BY中给出它返回3条记录,每条记录包含一条记录,其中一条记录都是null。

在普通的SQL中,您只需使用MAX表达式来获取每个值的实际值,但在Stream Analytics中您不能在字符串上使用MAX。您也无法使用COALESCE以及我尝试解决此问题的其他一些方法。任何想法如何改变结果:

EventDateTime  Event      PageType UserId  AppVersion CountA
2017-05-24     Nav Show   NULL     NULL    2.0.1293     1
2017-05-24     Nav Show   NULL     SIRTSW  2.0.1293     1
2017-05-24     Nav Show   Trade    NULL    2.0.1293     1

2017-05-24     Nav Show   Trade    SIRTSW  2.0.1293     1  ?

每个返回三行的代码如下(请注意,e.event是一个项目的数组,因此它不会导致同样的问题):

SELECT flatEvent.ArrayValue.name as Event, 
e.context.data.eventTime as EventDateTime,
e.context.application.version as AppVersion 
,flatCustom.ArrayValue.UserId as UserId
,flatCustom.ArrayValue.PageType as PageType, 
SUM(flatEvent.ArrayValue.count) as CountA
INTO
      [insights] 
    FROM [ios] e
    CROSS APPLY GetArrayElements(e.[event]) as flatEvent
    OUTER APPLY GetArrayElements(e.[context].[custom].[dimensions]) as flatCustom
    GROUP BY SlidingWindow(minute, 1),
    flatEvent.ArrayValue.name,
    e.context.data.eventTime,
    e.context.application.version,
    flatCustom.ArrayValue.UserId,
    flatCustom.ArrayValue.PageType

提前致谢, 罗布

1 个答案:

答案 0 :(得分:1)

根据您的方案,我假设您可以使用JavaScript user-defined functions进行Azure流分析,将多个维度合并为一个记录。以下是我对此问题的测试,您可以参考它们。

JSON文件

{
  "context":{
     "data":{"eventTime":"2017-05-24"},
     "application":{"version":"2.0.1293"},
     "custom":{
        "dimensions":[
           {"PageType":null,"UserId":"SIRTSW"},
           {"PageType":"Trade","UserId":null},
           {"PageType":null,"UserId":null}
        ]
     }
  },
  "event":[
    {"name":"Nav Show","count":1}
  ]
}

javascript UDF,UDF.coalesce

function main(items) {
    var result=[];
    var UserIdStr="",PageTypeStr="";
    for(var i=0;i<items.length;i++){
        if(items[i].UserId!=null && items[i].UserId!=undefined)
         UserIdStr+=items[i].UserId;
        if(items[i].PageType!=null && items[i].PageType!=undefined)
         PageTypeStr+=items[i].PageType;
    }
    result.push({UserId:UserIdStr,PageType:PageTypeStr});
    return result;
}

<强>查询

--first query
WITH f AS (
SELECT 
e.context.data.eventTime as EventDateTime,
e.context.application.version as AppVersion,
e.event as flatEvent,
UDF.coalesce(e.[context].[custom].[dimensions]) as flatDimensions
    FROM [ios] e
)

--second query
SELECT flatEvent.ArrayValue.name as Event,
f.EventDateTime,
f.AppVersion,
flatDimension.ArrayValue.UserId,
flatDimension.ArrayValue.PageType,
SUM(flatEvent.ArrayValue.count) as CountA
FROM f
CROSS APPLY GetArrayElements(f.[flatEvent]) as flatEvent
OUTER APPLY GetArrayElements(f.[flatDimensions]) as flatDimension
GROUP BY SlidingWindow(minute, 1),
    flatEvent.ArrayValue.name,
    f.EventDateTime,
    f.AppVersion,
    flatDimension.ArrayValue.UserId,
    flatDimension.ArrayValue.PageType

测试结果 enter image description here