我最近开始学习sql并且我没有先前的编码经验,所以它可能只是一个愚蠢的错误(在这种情况下,抱歉长篇文章:))。如果你能帮我解决当前的问题,那就太好了。
我有一张看起来像这样的表
id / n(特定事件的名称)/ utc(时间戳)/ json_data(包含多个参数的json字符串)。
我的目标很简单:我正在尝试获取json_data中找到的值参数的总和,按n分组。不幸的是,一些问题使得执行变得更加复杂。
我们遇到了垃圾邮件问题,导致相同的事件被发送数百次或数千次,并且需要将其过滤掉。我通常通过在group子句中使用utc(时间戳)来解决它,该子句也将包括其他所选列,并获取每个特定事件的一个实例。
有些事件会在“值字段”中返回负值,并且需要在所有计数和求和中忽略这些值。
由于事情变得非常简单,json_data列中值字段的名称始终不同,具体取决于发送的事件类型。但是,我通过查询中可以看到的各种字符串操作来解决这个问题。
这是我到目前为止所得到的
SELECT
b.Event_Name as Event_Name
, COUNT(b.Event_Name) as event_occurrences
, SUM(b.item_value) as user_spendings
FROM
(SELECT
a.id as Player_ID
, a.n as Event_Name
, a.utc as timing
, CASE
WHEN
MAX( a.ALPHA_Value
+ a.BETA_Value
+ a.GAMMA_Value
+ a.DELTA_Value
+ a.EPSILON_Value
+ a.BETAUPGRADE_Value
+ a.ZETA_Value
+ a.ALPHASKIN_Value
+ a.UPGRADEALPHA_Value) <= 0
THEN 0
ELSE
MAX(a.ALPHA_Value
+ a.BETA_Value
+ a.GAMMA_Value
+ a.DELTA_Value
+ a.EPSILON_Value
+ a.BETAUPGRADE_Value
+ a.ZETA_Value
+ a.ALPHASKIN_Value
+ a.UPGRADEALPHA_Value) END as item_value
FROM
(SELECT
id
, n
, utc
, MAX(TRIM(get_json_object(json_data, '$. ALPHA_Value '))) as ALPHA_Value
, MAX(TRIM(get_json_object(json_data, '$. BETA_Value '))) as BETA_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. GAMMA_Value ')), 6,
(LOCATE(' resource 2',
SUBSTR
(TRIM(get_json_object(json_data, '$. GAMMA_Value ')), 6))-1))) as GAMMA_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. DELTA_Value ')), 6)) as DELTA_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. EPSILON_Value ')), 6)) as EPSILON_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. BETAUPGRADE_Value ')), 6)) as BETAUPGRADE_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. ZETA_Value ')), 6)) as ZETA_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. ALPHASKIN_Value ')), 6)) as ALPHASKIN_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. UPGRADEALPHA_Value ')), 6,
(LOCATE(' resource 2',
SUBSTR
(TRIM(get_json_object(json_data, '$. UPGRADEALPHA_Value ')), 6))-1))) as UPGRADEALPHA_Value
FROM application_events
WHERE
month = 201409
AND FROM_UNIXTIME(utc_timestamp) > '2014-09-04 12:00:00'
GROUP BY id, n, utc
ORDER BY id, n
) a
GROUP by a.id, a.n, a.utc
ORDER by timing, Event_Name
) b
WHERE b.item_value > 0
GROUP by b.Event_Name
ORDER by user_spendings
我的推理如下:
我从json_data中获取值,同时使用GROUP,id,n,utc删除垃圾邮件。我在get_json_object上使用MAX来允许使用前面的列进行分组。由于id,name和timestamp的组合是唯一的(除了垃圾邮件ofc),MAX将使用相同的值。 由于每个事件只有1个值字段(根据事件类型具有不同的名称),我将拥有所有列,但只有一个具有值(其他列将为空)。
我摆脱了负面价值:现在,因为我无法在where子句中加上一笔金额,我能想到的唯一方法就是创建另一个表(b)来检查是否a中所有值列的总和为负数(正如我所说,除了一个之外它们都是空的,所以如果有负数,则总和也是如此),如果不是则返回总和(别名为item_value)
第三个表最终将计算事件数并对值进行求和。
我当前的问题是在第2步。当我运行子查询a时,它看起来很好,我得到了结果。当我在原始查询(计算事件和汇总值)中运行时,我也得到了结果。所以我猜测我的条件有问题,因为完整的查询在表格中没有给我任何结果。
我尝试将总和放在WHERE子句中,但没有用。任何想法都是受欢迎的,特别是如果你知道更简单的方法。
非常感谢你们。
答案 0 :(得分:0)
您的查询看起来是正确的,我删除了一些额外的部分(但它不是必需的):
SELECT
b.Event_Name as Event_Name
, COUNT(b.Event_Name) as event_occurrences
, SUM(b.item_value) as user_spendings
FROM (SELECT
a.id as Player_ID
, a.n as Event_Name
, a.utc as timing
COALESCE(a.ALPHA_Value, CAST(0 AS BIGINT))
+ COALESCE(a.BETA_Value, CAST(0 AS BIGINT))
+ COALESCE(a.GAMMA_Value, CAST(0 AS BIGINT))
+ COALESCE(a.DELTA_Value, CAST(0 AS BIGINT))
+ COALESCE(a.EPSILON_Value, CAST(0 AS BIGINT))
+ COALESCE(a.BETAUPGRADE_Value, CAST(0 AS BIGINT))
+ COALESCE(a.ZETA_Value, CAST(0 AS BIGINT))
+ COALESCE(a.ALPHASKIN_Value, CAST(0 AS BIGINT))
+ COALESCE(a.UPGRADEALPHA_Value, CAST(0 AS BIGINT)) as item_value
FROM (SELECT
id
, n
, utc
, MAX(TRIM(get_json_object(json_data, '$. ALPHA_Value '))) as ALPHA_Value
, MAX(TRIM(get_json_object(json_data, '$. BETA_Value '))) as BETA_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. GAMMA_Value ')), 6,
(LOCATE(' resource 2',
SUBSTR
(TRIM(get_json_object(json_data, '$. GAMMA_Value ')), 6))-1))) as GAMMA_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. DELTA_Value ')), 6)) as DELTA_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. EPSILON_Value ')), 6)) as EPSILON_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. BETAUPGRADE_Value ')), 6)) as BETAUPGRADE_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. ZETA_Value ')), 6)) as ZETA_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. ALPHASKIN_Value ')), 6)) as ALPHASKIN_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. UPGRADEALPHA_Value ')), 6,
(LOCATE(' resource 2',
SUBSTR
(TRIM(get_json_object(json_data, '$. UPGRADEALPHA_Value ')), 6))-1))) as UPGRADEALPHA_Value
FROM application_events
WHERE
month = 201409
AND FROM_UNIXTIME(utc_timestamp) > '2014-09-04 12:00:00'
GROUP BY id, n, utc
) a
) b
WHERE b.item_value > 0
GROUP by b.Event_Name
ORDER by user_spendings
我想你想要求的一些值是NULL。所以我添加了COALESCE
P.S。你不需要子查询&#34; b&#34;,你可以在子查询中做同样的事情&#34; a&#34;但我没有触及这个以获得更好的可读性