关联独立事件的序列 - 计算时间交集

时间:2016-11-06 17:02:40

标签: sql sql-server powerbi dax

我们正在构建一个PowerBI报告解决方案,而我(井堆栈)解决了one problem,业务提出了一个新的报告理念。不知道最好的方法,因为我对PowerBI知之甚少,而且业务似乎想要非常复杂的报告。

我们有两个来自不同数据源的事件序列。它们都包含车辆发生的独立事件。一个描述了车辆在哪个位置 - 另一个描述了具有事故原因代码的事件事件。企业希望报告每个位置花费的时间。车辆可以完全独立于发生的事故事件而改变位置 - 事件实际上是日期时间并且在整天中随机发生。每种类型的事件都有startime / endtime和vehicleID。

车辆位置事件

+------------------+-----------+------------+-----------------+----------------+
| LocationDetailID | VehicleID | LocationID |  StartDateTime  |  EndDateTime   |
+------------------+-----------+------------+-----------------+----------------+
|                1 |         1 |          1 |        2012-1-1 |      2016-1-1  |
|                2 |         1 |          2 |        2016-1-1 |      2016-4-1  |
|                3 |         1 |          1 |        2016-4-1 |      2016-11-1 |
|                4 |         2 |          1 |        2011-1-1 |      2016-11-1 |
+------------------+-----------+------------+-----------------+----------------+

车辆状态事件

+---------+---------------+-------------+-----------+--------------+
| EventID | StartDateTime | EndDateTime | VehicleID | ReasonCodeID |
+---------+---------------+-------------+-----------+--------------+
|       1 | 2012-1-1      | 2013-1-1    |         1 |            1 |
|       2 | 2013-1-1      | 2015-1-1    |         1 |            3 |
|       3 | 2015-1-1      | 2016-5-1    |         1 |            4 |
|       4 | 2016-5-1      | 2016-11-1   |         1 |            2 |
|       5 | 2015-9-1      | 2016-2-1    |         2 |            1 |
+---------+---------------+-------------+-----------+--------------+

无论如何,我可以将两个流关联在一起并计算每个位置每个ReasonCode的每个车辆的总时间吗?这似乎要求我能够将这两个事件联系起来 - 因此位置的更改可能会在给定的ReasonCode中发生。

计算示例ReasonCodeID 4

  • VehicleID 1位于2012-1-1至2016-1-1的位置ID 1和 2016-4-1至2016-11-1
  • VehicleID 1位于2016-1-1的位置ID 2 至2016-4-1
  • VehcileID 1具有从2015-1-1到的ReasonCodeID 4 2016年5月1日

因此,位置1中的第一个周期与ReasonCodeID 4的365天相交(2015-1-1至2016-1-1)。位置1的第2期与30天(2016-4-1至2016-5-1)相交。 在位置2与ReasonCodeID 4的91天相交(2016-1-1至2016-4-1

所需的输出如下。

+-----------+--------------+------------+------------+
| VehicleID | ReasonCodeID | LocationID | Total Days |
+-----------+--------------+------------+------------+
|         1 |            1 |          1 |        366 |
|         1 |            3 |          1 |        730 |
|         1 |            4 |          1 |        395 |
|         1 |            4 |          2 |         91 |
|         1 |            2 |          1 |        184 |
|         2 |            1 |          1 |        154 |
+-----------+--------------+------------+------------+

我创建了一个显示结构here

的SQL小提琴

车辆有相关的表格,我确信企业会希望它们按车辆类别等分组,但如果我能理解如何计算这种情况下的交叉点,这将为我提供其余报告的基础。

2 个答案:

答案 0 :(得分:3)

我认为此解决方案需要CROSS JOIN实施。两个表之间的关系是Many to Many,这意味着创建了一个桥接LocationEventsVehicleStatusEvents表的第三个表,所以我认为在表达式中指定关系可能更容易。

我在两个表之间使用CROSS JOIN,然后仅过滤结果以获取两个表中VehicleID列相同的行。我还过滤了VehicleStatusEvents范围日期与LocationEvents范围日期相交的行。

完成过滤后,我会添加一列来计算每个交叉点之间的天数。最后,该度量总结了每个VehicleID,ReasonCodeID和LocationID的日期。

要实施CROSS JOIN,您必须在两个表中的任何一个上重命名VehicleIDStartDateTimeEndDateTime。有必要避免错误的列名错误。

我按如下方式重命名列:

VehicleIDLocationVehicleIDStatusVehicleID
StartDateTimeLocationStartDateTimeStatusStartDateTime
EndDateTimeLocationEndDateTimeStatusEndDateTime

在此之后,您可以在Total Days指标中使用CROSSJOIN:

Total Days =
SUMX (
    FILTER (
        ADDCOLUMNS (
            FILTER (
                CROSSJOIN ( LocationEvents, VehicleStatusEvents ),
                LocationEvents[LocationVehicleID] = VehicleStatusEvents[StatusVehicleID]
                    && LocationEvents[LocationStartDateTime] <= VehicleStatusEvents[StatusEndDateTime]
                    && LocationEvents[LocationEndDateTime] >= VehicleStatusEvents[StatusStartDateTime]
            ),
            "CountOfDays", IF (
                [LocationStartDateTime] <= [StatusStartDateTime]
                    && [LocationEndDateTime] >= [StatusEndDateTime],
                DATEDIFF ( [StatusStartDateTime], [StatusEndDateTime], DAY ),
                IF (
                    [LocationStartDateTime] > [StatusStartDateTime]
                        && [LocationEndDateTime] >= [StatusEndDateTime],
                    DATEDIFF ( [LocationStartDateTime], [StatusEndDateTime], DAY ),
                    IF (
                        [LocationStartDateTime] <= [StatusStartDateTime]
                            && [LocationEndDateTime] <= [StatusEndDateTime],
                        DATEDIFF ( [StatusStartDateTime], [LocationEndDateTime], DAY ),
                        IF (
                            [LocationStartDateTime] >= [StatusStartDateTime]
                                && [LocationEndDateTime] <= [StatusEndDateTime],
                            DATEDIFF ( [LocationStartDateTime], [LocationEndDateTime], DAY ),
                            BLANK ()
                        )
                    )
                )
            )
        ),
        LocationEvents[LocationID] = [LocationID]
            && VehicleStatusEvents[ReasonCodeID] = [ReasonCodeID]
    ),
    [CountOfDays]
)

然后在Power BI中,您可以使用此度量构建矩阵(或任何其他可视化):

enter image description here

如果你不完全理解度量表达式,这里是T-SQL翻译:

SELECT
    dt.VehicleID,
    dt.ReasonCodeID,
    dt.LocationID,
    SUM(dt.Diff) [Total Days]
FROM 
(
    SELECT
        CASE
            WHEN a.StartDateTime <= b.StartDateTime AND a.EndDateTime >= b.EndDateTime  -- Inside range
               THEN DATEDIFF(DAY, b.StartDateTime, b.EndDateTime)
            WHEN a.StartDateTime > b.StartDateTime AND a.EndDateTime >= b.EndDateTime  -- |-----|*****|....|
               THEN DATEDIFF(DAY, a.StartDateTime, b.EndDateTime)
            WHEN a.StartDateTime <= b.StartDateTime AND a.EndDateTime <= b.EndDateTime  -- |...|****|-----|
               THEN DATEDIFF(DAY, b.StartDateTime, a.EndDateTime)
            WHEN a.StartDateTime >= b.StartDateTime AND a.EndDateTime <= b.EndDateTime  -- |---|****|-----
               THEN DATEDIFF(DAY, a.StartDateTime, a.EndDateTime)
        END Diff,
        a.VehicleID,
        b.ReasonCodeID,
        a.LocationID --a.StartDateTime, a.EndDateTime, b.StartDateTime, b.EndDateTime
    FROM LocationEvents a
        CROSS JOIN VehicleStatusEvents b
    WHERE a.VehicleID = b.VehicleID
        AND 
        (
            (a.StartDateTime <= b.EndDateTime)
                AND (a.EndDateTime >= b.StartDateTime)
        )
) dt
GROUP BY dt.VehicleID,
         dt.ReasonCodeID,
         dt.LocationID

请注意,在T-SQL中,您也可以使用INNER JOIN运算符。

如果有帮助,请告诉我。

答案 1 :(得分:1)

select      coalesce(l.VehicleID,s.VehicleID)   as VehicleID
           ,s.ReasonCodeID
           ,l.LocationID

           ,sum
            (
                datediff
                (
                    day
                   ,case when s.StartDateTime > l.StartDateTime then s.StartDateTime else l.StartDateTime end
                   ,case when s.EndDateTime   < l.EndDateTime   then s.EndDateTime   else l.EndDateTime   end
                )
            )   as TotalDays

from                    VehicleLocationEvents   as l

            full join   VehicleStatusEvents     as s

            on          s.VehicleID =
                        l.VehicleID

                    and case when s.StartDateTime > l.StartDateTime then s.StartDateTime else l.StartDateTime end   <=
                        case when s.EndDateTime   < l.EndDateTime   then s.EndDateTime   else l.EndDateTime   end   

group by    coalesce(l.VehicleID,s.VehicleID)
           ,s.ReasonCodeID
           ,l.LocationID

select      VehicleID
           ,ReasonCodeID
           ,LocationID
           ,sum (datediff (day,max_StartDateTime,min_EndDateTime))  as TotalDays

from       (select      coalesce(l.VehicleID,s.VehicleID)   as VehicleID
                       ,s.ReasonCodeID
                       ,l.LocationID

                       ,case when s.StartDateTime > l.StartDateTime then s.StartDateTime else l.StartDateTime end   as max_StartDateTime
                       ,case when s.EndDateTime   < l.EndDateTime   then s.EndDateTime   else l.EndDateTime   end   as min_EndDateTime

            from                    VehicleLocationEvents   as l

                        full join   VehicleStatusEvents     as s

                        on          s.VehicleID =
                                    l.VehicleID
            ) ls

where       max_StartDateTime <= min_EndDateTime

group by    VehicleID
           ,ReasonCodeID
           ,LocationID