在MySQL中动态制作复杂的折叠

时间:2018-05-01 11:20:37

标签: mysql sql data-analysis

这真的是多个问题,我为问题超载道歉,但我只需要优雅地完成这些。我可以在MySQL中处理简单查询,但是那些复杂的表格通常很困难,而且我还不熟悉动态SQL。寻找简单的解决方案(但不是硬编码):我不确定在SO的问题中是否要求太多,如果确实太多,请回答一个或两个聚合并给我工具,这样我就可以自己构建这些聚合。

我的数据结构如下:

+-----------------------------------------------------------------+
|  timestamp           group   url         metric columns here    |
+-----------------------------------------------------------------+
| 2018-05-01 14:30:00 6732    abc.com     -0.3673 -0.0914 4.0183  |
| 2018-05-01 14:30:00 6732    xyz.com      4.2187  0.3407 12.3832 |
| 2018-05-01 14:30:00 6732    pqr.org     -2.3875 -0.4064 5.8743  |
| 2018-05-01 14:30:00 6732    many.com    -4.4194 -1.0665 4.144   |
| 2018-05-01 14:00:00 7174    abc.com     -6.4021 -1.419  4.5117  |
| 2018-05-01 14:00:00 7174    xyz.com     -1.7971 -1.0396 1.7286  |
| 2018-05-01 14:00:00 7174    many.com     0.5276  0.2621 2.013   |
| 2018-05-01 13:30:00 7174    many.com    -0.4941 -0.1098 4.4982  |
| 2018-05-01 13:30:00 7184    diff.com    -0.6783 -0.1384 4.9013  |
| 2018-05-01 13:30:00 7184    sites.com   -0.1293 -0.0246 5.2608  |
| 2018-05-01 13:30:00 7184    here.com    -0.2703 -0.0669 4.0377  |
+-----------------------------------------------------------------+

基本上,对于每个时间戳,我们都有来自不同组的数据,对于每个组,我们都有网址,对于每个网址,我们都会捕获指标。网址和广告组有很多关系。

我必须根据具体情况以多种方式提取和汇总这些数据。通常,我选择我需要的任何指标,并按时间戳,组和网址中的一个或多个进行分组。但是,有时我想在组中看到数据/聚合,但我最终会为它们运行单独的查询。例如,我在特定时间窗口看到的时间聚合,某个指标已经下降或者上升,我想分别深入到每个时间窗口,我将不得不重复这个,因为在一个时间窗口内,某些群体可以上下来挖掘它们以获得url阶段需要一个单独的查询。我需要的是一种在最高级别聚合的方法 - 时间戳和组,但也显示来自以下级别的聚合。一个例子:

这样的事情会有所帮助:

+---------------------+-------------+-------------+------------------+------------------------------+------------------------------+--------------------+--------------------------------+--------------------------------+---------------------+---------------------------------+---------------------------------+
|      timestamp      | aggregate_1 | aggregate_2 | window_top_group | window_top_group_aggregate_1 | window_top_group_aggregate_2 | window_top_group_2 | window_top_group_2_aggregate_1 | window_top_group_2_aggregate_2 | window_loss_group_1 | window_loss_group_1_aggregate_1 | window_loss_group_1_aggregate_2 |
+---------------------+-------------+-------------+------------------+------------------------------+------------------------------+--------------------+--------------------------------+--------------------------------+---------------------+---------------------------------+---------------------------------+
| 2018-05-01 14:30:00 | -0.3673     | -0.0914     |             6732 | -0.3673                      | -0.3673                      |               7174 | -0.3673                        | -0.3673                        |                7184 | -0.3673                         | -0.3673                         |
| 2018-05-01 14:00:00 | 4.2187      | 0.3407      |             6732 | 4.2187                       | 4.2187                       |               7174 | 4.2187                         | 4.2187                         |                7184 | 4.2187                          | 4.2187                          |
| 2018-05-01 13:30:00 | -2.3875     | -0.4064     |             6732 | -2.3875                      | -2.3875                      |               7174 | -2.3875                        | -2.3875                        |                7184 | -2.3875                         | -2.3875                         |
| 2018-05-01 13:00:00 | -4.4194     | -1.0665     |             6732 | -4.4194                      | -4.4194                      |               7174 | -4.4194                        | -4.4194                        |                7184 | -4.4194                         | -4.4194                         |
| 2018-05-01 12:30:00 | -6.4021     | -1.419      |             7174 | -6.4021                      | -6.4021                      |               7184 | -6.4021                        | -6.4021                        |                6732 | -6.4021                         | -6.4021                         |
| 2018-05-01 12:00:00 | -1.7971     | -1.0396     |             7174 | -1.7971                      | -1.7971                      |               7184 | -1.7971                        | -1.7971                        |                6732 | -1.7971                         | -1.7971                         |
| 2018-05-01 11:30:00 | 0.5276      | 0.2621      |             7174 | 0.5276                       | 0.5276                       |               7184 | 0.5276                         | 0.5276                         |                6732 | 0.5276                          | 0.5276                          |
| 2018-05-01 11:00:00 | -0.4941     | -0.1098     |             7174 | -0.4941                      | -0.4941                      |               6732 | -0.4941                        | -0.4941                        |                7184 | -0.4941                         | -0.4941                         |
| 2018-05-01 10:30:00 | -0.6783     | -0.1384     |             7184 | -0.6783                      | -0.6783                      |               6732 | -0.6783                        | -0.6783                        |                7174 | -0.6783                         | -0.6783                         |
| 2018-05-01 10:00:00 | -0.1293     | -0.0246     |             7184 | -0.1293                      | -0.1293                      |               6732 | -0.1293                        | -0.1293                        |                7174 | -0.1293                         | -0.1293                         |
| 2018-05-01 9:30:00  | -0.2703     | -0.0669     |             7184 | -0.2703                      | -0.2703                      |               6732 | -0.2703                        | -0.2703                        |                7174 | -0.2703                         | -0.2703                         |
+---------------------+-------------+-------------+------------------+------------------------------+------------------------------+--------------------+--------------------------------+--------------------------------+---------------------+---------------------------------+---------------------------------+

也许我们甚至可以更深入一级?并且在汇总时间戳时说,获取顶级组的顶级网址或顶级组网址组合?

很少有其他可能真正有用的聚合:

1)说出一个特定的时间范围,比如整整一个月: 由网址汇总,显示最佳/最差时间&整个范围的值,但也会在整个月的整个时间内对它们进行平均,并在那里获取聚合,如图所示:

+-----------+-------------+-------------+------------------------------------+-------------------------------------+--------------------------+----------------------------+--------------+----------------------------+----------------+------------------------------+
|    url    | aggregate_1 | aggregate_2 | best performance timestamp overall | worst performance timestamp overall | peak time of average day | trough time of average day | mean_at_peak | standard_deviation_at_peak | mean_at_trough | standard_deviation_at_trough |
+-----------+-------------+-------------+------------------------------------+-------------------------------------+--------------------------+----------------------------+--------------+----------------------------+----------------+------------------------------+
| abc.com   | -0.3673     | -0.3673     | 2018-05-01 14:30:00                | 2018-05-01 14:30:00                 | 2018-05-01 9:30:00       | 2018-05-01 9:30:00         | 0.5276       | 0.5276                     | 0.5276         | 0.5276                       |
| xyz.com   | 4.2187      | 4.2187      | 2018-05-01 14:00:00                | 2018-05-01 14:00:00                 | 2018-05-01 10:00:00      | 2018-05-01 10:00:00        | 0.5276       | 0.5276                     | 0.5276         | 0.5276                       |
| pqr.org   | -2.3875     | -2.3875     | 2018-05-01 13:30:00                | 2018-05-01 13:30:00                 | 2018-05-01 10:30:00      | 2018-05-01 10:30:00        | 4.2187       | 4.2187                     | 4.2187         | 4.2187                       |
| many.com  | -4.4194     | -4.4194     | 2018-05-01 13:00:00                | 2018-05-01 13:00:00                 | 2018-05-01 10:30:00      | 2018-05-01 10:30:00        | 5.449066667  | 5.449066667                | 5.449066667    | 5.449066667                  |
| abc.com   | -6.4021     | -6.4021     | 2018-05-01 12:30:00                | 2018-05-01 10:30:00                 | 2018-05-01 12:00:00      | 2018-05-01 12:00:00        | 4.2187       | 4.2187                     | 4.2187         | 4.2187                       |
| xyz.com   | -1.7971     | -1.7971     | 2018-05-01 12:00:00                | 2018-05-01 12:00:00                 | 2018-05-01 10:30:00      | 2018-05-01 10:30:00        | 0.5276       | 0.5276                     | 0.5276         | 0.5276                       |
| pqr.org   | 0.5276      | 0.5276      | 2018-05-01 11:30:00                | 2018-05-01 10:30:00                 | 2018-05-01 10:30:00      | 2018-05-01 10:30:00        | 7.985716667  | 7.985716667                | 7.985716667    | 7.985716667                  |
| many.com  | -0.4941     | -0.4941     | 2018-05-01 11:00:00                | 2018-05-01 11:00:00                 | 2018-05-01 11:00:00      | 2018-05-01 11:00:00        | 4.2187       | 4.2187                     | 4.2187         | 4.2187                       |
| many.com  | -0.6783     | -0.6783     | 2018-05-01 10:30:00                | 2018-05-01 10:30:00                 | 2018-05-01 9:30:00       | 2018-05-01 9:30:00         | 0.5276       | 0.5276                     | 0.5276         | 0.5276                       |
| sites.com | -0.1293     | -0.1293     | 2018-05-01 10:00:00                | 2018-05-01 10:00:00                 | 2018-05-01 10:30:00      | 2018-05-01 10:30:00        | 9.522366667  | 9.522366667                | 9.522366667    | 9.522366667                  |
| here.com  | -0.2703     | -0.2703     | 2018-05-01 9:30:00                 | 2018-05-01 9:30:00                  | 2018-05-01 10:00:00      | 2018-05-01 10:00:00        | 4.2187       | 4.2187                     | 4.2187         | 4.2187                       |
+-----------+-------------+-------------+------------------------------------+-------------------------------------+--------------------------+----------------------------+--------------+----------------------------+----------------+------------------------------+

2)对于指定网址的列表或让查询本身构建网址列表,例如与每个窗口中使用metric_1相匹配的模式或前3个网址,显示所提供或所需指标的百分比贡献:

+---------------------+----------+-------------------------------+-------------------------------+-------------------------------+----------+-------------------------------+-------------------------------+-------------------------------+
|      timestamp      | metric_1 | contribution_percentage_url_1 | contribution_percentage_url_2 | contribution_percentage_url_3 | metric_2 | contribution_percentage_url_1 | contribution_percentage_url_2 | contribution_percentage_url_3 |
+---------------------+----------+-------------------------------+-------------------------------+-------------------------------+----------+-------------------------------+-------------------------------+-------------------------------+
| 2018-05-01 14:30:00 | -0.3673  |                            33 |                            26 |                            18 | -0.3673  |                            53 |                            30 |                            11 |
| 2018-05-01 14:00:00 | 4.2187   |                            33 |                            29 |                            12 | 4.2187   |                            30 |                            32 |                            20 |
| 2018-05-01 13:30:00 | -2.3875  |                            53 |                            29 |                            17 | -2.3875  |                            37 |                            32 |                            11 |
| 2018-05-01 13:00:00 | -4.4194  |                            39 |                            27 |                            19 | -4.4194  |                            31 |                            34 |                            10 |
| 2018-05-01 10:30:00 | -6.4021  |                            41 |                            25 |                            15 | -6.4021  |                            31 |                            30 |                            16 |
| 2018-05-01 12:00:00 | -1.7971  |                            45 |                            27 |                            12 | -1.7971  |                            32 |                            30 |                            12 |
| 2018-05-01 10:30:00 | 0.5276   |                            50 |                            35 |                            18 | 0.5276   |                            41 |                            25 |                            13 |
| 2018-05-01 11:00:00 | -0.4941  |                            33 |                            33 |                            16 | -0.4941  |                            44 |                            34 |                            13 |
| 2018-05-01 10:30:00 | -0.6783  |                            53 |                            33 |                            18 | -0.6783  |                            54 |                            33 |                            16 |
| 2018-05-01 10:00:00 | -0.1293  |                            38 |                            31 |                            14 | -0.1293  |                            42 |                            31 |                            17 |
| 2018-05-01 9:30:00  | -0.2703  |                            30 |                            35 |                            11 | -0.2703  |                            30 |                            35 |                            16 |
+---------------------+----------+-------------------------------+-------------------------------+-------------------------------+----------+-------------------------------+-------------------------------+-------------------------------+

3)透视: 对于提供的日期列表,或者从提供的日期开始的+ - 5天以及特定的关键度量:比较整个日期的度量:

+-------------+---------+-------------+-------------+-------------+-------------+---------+-------------+-------------+-------------+-------------+---------+
| time of day | date-5  |   date-4    |   date-3    |   date-2    |   date-1    |  date   |   date+1    |   date+2    |   date+3    |   date+4    | date+5  |
+-------------+---------+-------------+-------------+-------------+-------------+---------+-------------+-------------+-------------+-------------+---------+
| 14:30:00    | -0.3673 | 0.5276      | 0.5276      | 0.5276      | 0.5276      | -0.3673 | 0.5276      | 0.5276      | 0.5276      | 0.5276      | -0.3673 |
| 14:00:00    | 4.2187  | 0.5276      | 0.5276      | 0.5276      | 0.5276      | 4.2187  | 0.5276      | 0.5276      | 0.5276      | 0.5276      | 4.2187  |
| 13:30:00    | -2.3875 | 4.2187      | 4.2187      | 4.2187      | 4.2187      | -2.3875 | 4.2187      | 4.2187      | 4.2187      | 4.2187      | -2.3875 |
| 13:00:00    | -4.4194 | 5.449066667 | 5.449066667 | 5.449066667 | 5.449066667 | -4.4194 | 5.449066667 | 5.449066667 | 5.449066667 | 5.449066667 | -4.4194 |
| 12:30:00    | -6.4021 | 4.2187      | 4.2187      | 4.2187      | 4.2187      | -6.4021 | 4.2187      | 4.2187      | 4.2187      | 4.2187      | -6.4021 |
| 12:00:00    | -1.7971 | 0.5276      | 0.5276      | 0.5276      | 0.5276      | -1.7971 | 0.5276      | 0.5276      | 0.5276      | 0.5276      | -1.7971 |
| 11:30:00    | 0.5276  | 7.985716667 | 7.985716667 | 7.985716667 | 7.985716667 | 0.5276  | 7.985716667 | 7.985716667 | 7.985716667 | 7.985716667 | 0.5276  |
| 11:00:00    | -0.4941 | 4.2187      | 4.2187      | 4.2187      | 4.2187      | -0.4941 | 4.2187      | 4.2187      | 4.2187      | 4.2187      | -0.4941 |
| 10:30:00    | -0.6783 | 0.5276      | 0.5276      | 0.5276      | 0.5276      | -0.6783 | 0.5276      | 0.5276      | 0.5276      | 0.5276      | -0.6783 |
| 10:00:00    | -0.1293 | 9.522366667 | 9.522366667 | 9.522366667 | 9.522366667 | -0.1293 | 9.522366667 | 9.522366667 | 9.522366667 | 9.522366667 | -0.1293 |
| 9:30:00     | -0.2703 | 4.2187      | 4.2187      | 4.2187      | 4.2187      | -0.2703 | 4.2187      | 4.2187      | 4.2187      | 4.2187      | -0.2703 |
+-------------+---------+-------------+-------------+-------------+-------------+---------+-------------+-------------+-------------+-------------+---------+

4)有一个名为metric_lg的指标,它根据计数表示网址的生命周期或生命周期。因此,从指定日期或组的第一个时间戳开始,根据其计数计算某些度量聚合,即对于单个URL,范围将为1-5,5-10,10-20,20-50 ,50-80,80-200,200-1000,1000-10000,10000 +:让他们称他们为A,B,C,D,E,F,G,H,I阶段。然而,这个计数需要从小组开始的时候开始累积,即从小组中出现。假设一组7184是在2018-05-01 10:00:00启动的,而7174是在2018-04-30 12:00:00启动的,那么两组中出现的特定网址都会从其中累积其metric_lg相应组的开始,即7184中的生命周期阶段将是从7184开始的metric_lg累积,即2018-05-01 10:00:00,其生命周期阶段在7174中将是metric_lg的累积开始7174即2018-04-30 12:00:00。

因此,对于提供的组列表,这样的事情将有所帮助:根据metric_lg生命周期阶段计算其他度量聚合,并比较按生命周期阶段打破的组性能。

+---------------------+--------------------+---------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|      timestamp      | A_aggregate_metric | B _aggregate_metric | C_aggregate_metric | D_aggregate_metric | E_aggregate_metric | F_aggregate_metric | G_aggregate_metric | H_aggregate_metric | I_aggregate_metric |
+---------------------+--------------------+---------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
| 2018-05-01 14:30:00 | -0.3673            | 0.5276              | 0.5276             | 0.5276             | 0.5276             | -0.3673            | 0.5276             | 0.5276             | 0.5276             |
| 2018-05-01 14:00:00 | 4.2187             | 0.5276              | 0.5276             | 0.5276             | 0.5276             | 4.2187             | 0.5276             | 0.5276             | 0.5276             |
| 2018-05-01 13:30:00 | -2.3875            | 4.2187              | 4.2187             | 4.2187             | 4.2187             | -2.3875            | 4.2187             | 4.2187             | 4.2187             |
| 2018-05-01 13:00:00 | -4.4194            | 5.449066667         | 5.449066667        | 5.449066667        | 5.449066667        | -4.4194            | 5.449066667        | 5.449066667        | 5.449066667        |
| 2018-05-01 10:30:00 | -6.4021            | 4.2187              | 4.2187             | 4.2187             | 4.2187             | -6.4021            | 4.2187             | 4.2187             | 4.2187             |
| 2018-05-01 12:00:00 | -1.7971            | 0.5276              | 0.5276             | 0.5276             | 0.5276             | -1.7971            | 0.5276             | 0.5276             | 0.5276             |
| 2018-05-01 10:30:00 | 0.5276             | 7.985716667         | 7.985716667        | 7.985716667        | 7.985716667        | 0.5276             | 7.985716667        | 7.985716667        | 7.985716667        |
| 2018-05-01 11:00:00 | -0.4941            | 4.2187              | 4.2187             | 4.2187             | 4.2187             | -0.4941            | 4.2187             | 4.2187             | 4.2187             |
| 2018-05-01 10:30:00 | -0.6783            | 0.5276              | 0.5276             | 0.5276             | 0.5276             | -0.6783            | 0.5276             | 0.5276             | 0.5276             |
| 2018-05-01 10:00:00 | -0.1293            | 9.522366667         | 9.522366667        | 9.522366667        | 9.522366667        | -0.1293            | 9.522366667        | 9.522366667        | 9.522366667        |
| 2018-05-01 9:30:00  | -0.2703            | 4.2187              | 4.2187             | 4.2187             | 4.2187             | -0.2703            | 4.2187             | 4.2187             | 4.2187             |
+---------------------+--------------------+---------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+

如果您需要数据上下文,请假设三个指标: metric_1:美元收入 metric_2:以美元计算的成本 metric_lg:数千的流量计数

PS:执行此操作是优于MySQL over python,因为其中一些将用于创建自定义VIEW,因此可以经常查看并进行进一步分析。

这很多,非常感谢,真的

0 个答案:

没有答案