我有以下情况:给定资源,在确定的时间段内可用,在此期间花费固定金额。我有用户可以在此期间访问该资源。我需要在访问它的用户之间分配资源的成本,考虑到用户在他不访问它时不能被收费。像这样:
红色条表示整个时间内资源的可用性。蓝色和绿色条表示各个用户访问资源的时间。请注意,在时间9,没有人访问该资源,因此没有人收费。考虑到整个时间内的资源成本$ 100
,将向用户1收取$ 40
和用户2 $ 50
的费用。 $ 10
会丢失。
这个想法很简单:占用资源的全部成本并除以每个用户使用的时间。但是当我们同时使用资源时问题就出现了:
在这种情况下,在时间4和5,两个用户都使用相同的资源。在这种情况下,我需要将重叠时间的成本除以2(并发用户数),给出正确的值。
换句话说:我使用资源的用户越多,每个用户就越便宜。
当然问题会变得更加复杂,例如:
目前我有一个具有以下结构的表(使用示例3):
+---------------------------------------------------------------------------
|ResourceId |UserId |UsageStart |UsageEnd |ResourceTotalCost |WeightedCost |
+--------------------------------------------------------------------------+
|res1 |u1 |time 0 |time 1 |100 |20 |
|res1 |u1 |time 4 |time 7 |100 |40 |
|res1 |u2 |time 4 |time 8 |100 |50 |
|res1 |u3 |time 1 |time 4 |100 |40 |
|res1 |u3 |time 8 |time 8 |100 |10 |
|---------------------------------------------------------------------------
我有每个用户的确切使用期限,加上资源的总成本(整个分析期间),以及用户的资源加权成本(这是我想要的列)提高)。
UsageStart
和UsageEnd
列是时间戳,具有毫秒精度(这意味着时间可以相隔1ms)。 ResourceId
和UserId
是没有模式的字符串(但保证每个资源和用户都是唯一的)。 ResourceTotalCost
和WeightedCost
都是浮点数。
我需要的输出与我已经拥有的输出相同,但考虑到用户之间资源的并发使用,加权成本。对于示例3,这是预期的输出:
+---------------------------------------------------------------------------
|ResourceId |UserId |UsageStart |UsageEnd |ResourceTotalCost |WeightedCost |
+--------------------------------------------------------------------------+
|res1 |u1 |time 0 |time 1 |100 |15 |
|res1 |u1 |time 4 |time 7 |100 |18.33 |
|res1 |u2 |time 4 |time 8 |100 |23.33 |
|res1 |u3 |time 1 |time 4 |100 |28.33 |
|res1 |u3 |time 8 |time 8 |100 |5 |
|---------------------------------------------------------------------------
那么,关于如何解决这个问题的任何想法?我考虑使用UDF来解决这个问题,但是目前UDF的使用强加了我在项目中无法承受的几个限制(例如6个UDF查询),因此最好使用纯BigQuery SQL。 / p>
感谢。
答案 0 :(得分:3)
请尝试以下内容 - 适用于BigQuery Standard SQL(请参阅Enabling Standard SQL和Migrating from legacy SQL)
如您所见 - 我稍微调整您的数据(UsageStart和UsageEnd是整数)
我将所有三个示例分别标记为res1,res2和res3 for ResourceId
此外,我还为每个资源添加了额外的条目以显示资源可用性。这些条目的UserId为NULL
所以查询是
WITH Usage AS (
SELECT ResourceId, UserId, UsageStart, UsageEnd + 1 AS UsageEnd, ResourceTotalCost
FROM (SELECT 'res3' AS ResourceId, 'u1' AS UserId, 0 AS UsageStart, 1 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res3' AS ResourceId, 'u1' AS UserId, 4 AS UsageStart, 7 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res3' AS ResourceId, 'u2' AS UserId, 4 AS UsageStart, 8 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res3' AS ResourceId, 'u3' AS UserId, 1 AS UsageStart, 4 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res3' AS ResourceId, 'u3' AS UserId, 8 AS UsageStart, 8 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res3' AS ResourceId, NULL AS UserId, 0 AS UsageStart, 9 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res1' AS ResourceId, 'u1' AS UserId, 0 AS UsageStart, 3 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res1' AS ResourceId, 'u2' AS UserId, 4 AS UsageStart, 8 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res1' AS ResourceId, NULL AS UserId, 0 AS UsageStart, 9 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res2' AS ResourceId, 'u1' AS UserId, 0 AS UsageStart, 5 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res2' AS ResourceId, 'u2' AS UserId, 4 AS UsageStart, 8 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res2' AS ResourceId, NULL AS UserId, 0 AS UsageStart, 9 AS UsageEnd, 100 AS ResourceTotalCost )
), iIntervals AS (
SELECT ResourceId, iStart, LEAD(iStart)
OVER(PARTITION BY ResourceId ORDER BY iStart) AS iEnd
FROM (
SELECT DISTINCT ResourceId, iStart FROM (
SELECT ResourceId, UsageStart AS iStart FROM Usage UNION ALL
SELECT ResourceId, UsageEnd AS iStart FROM Usage )
)
), iWeights AS (
SELECT iStart, iEnd, x.ResourceId, UserId, ResourceTotalCost,
SUM(iWeight / CASE WHEN Users = 0 THEN 1 ELSE Users END / width) AS iWeight
FROM (
SELECT iStart, iEnd, iEnd - iStart AS iWeight, iIntervals.ResourceId, UserId, ResourceTotalCost,
COUNT(DISTINCT UserId) OVER(PARTITION BY iIntervals.ResourceId, iStart, iEnd) AS Users
FROM iIntervals JOIN Usage
ON iIntervals.ResourceId = Usage.ResourceId
AND iStart >= UsageStart AND iEnd <= UsageEnd
WHERE iEnd IS NOT NULL ) AS x
JOIN (SELECT ResourceId, MAX(iEnd) - MIN(iStart) AS width FROM iIntervals GROUP BY 1) AS y
ON x.ResourceId = y.ResourceId WHERE NOT (UserId IS NULL AND Users > 0) GROUP BY 1, 2, 3, 4, 5
)
SELECT usage.ResourceId, usage.UserId, usage.UsageStart, usage.UsageEnd - 1 as UsageEnd,
iWeights.ResourceTotalCost, ROUND(SUM(iWeights.ResourceTotalCost * iWeight), 2) AS WeightedCost
FROM Usage JOIN iWeights
ON usage.ResourceId = iWeights.ResourceId AND usage.UserId = iWeights.UserId
AND iWeights.iStart BETWEEN usage.UsageStart AND usage.UsageEnd
AND iWeights.iEnd BETWEEN usage.UsageStart AND usage.UsageEnd
GROUP BY 1, 2, 3, 4, 5 ORDER BY 1, 2, 3
以下输出,这是我认为的预期
ResourceId UserId UsageStart UsageEnd ResourceTotalCost WeightedCost
res1 u1 0 3 100 40.0
res1 u2 4 8 100 50.0
res2 u1 0 5 100 50.0
res2 u2 4 8 100 40.0
res3 u1 0 1 100 15.0
res3 u1 4 7 100 18.33
res3 u2 4 8 100 23.33
res3 u3 1 4 100 28.33
res3 u3 8 8 100 5.0
您的UsageStart和UsageEnd很可能不是整数,因此您需要分别调整上述解决方案。但我的重点是为您提供合理的逻辑示例。
希望这有帮助!并且它可能也可能被优化
答案 1 :(得分:1)
请原谅我使用MySQL程序而不是BitQuery函数回答这个问题,但它应该是有用的。 我不得不从你的模型中做出一些假设:
UsageTable
有一个ID
主键列您可以使用1个选择和1个更新查询来执行循环:
CREATE PROCEDURE `calculateUsage`()
BEGIN
set @slot = 0;
repeat
set @increase =
(
select slotcost/count(id) as usercost
from (
select id, ResourceTotalCost/10 as slotcost
from UsageTable
where @slot between UsageStart and UsageEnd
) as x
);
update UsageTable
set WeightedCost = WeightedCost + @increase
where @slot between UsageStart and UsageEnd;
set @slot = @slot + 1;
until @slot = 10 end repeat;
END
运行该查询将表更新为:
ID RID UID St End RTC WC
1 1 1 0 1 100.00 15.00
2 1 1 4 7 100.00 18.33
3 1 2 4 8 100.00 23.33
4 1 3 1 4 100.00 28.33
5 1 3 8 8 100.00 5.00
答案 2 :(得分:1)
以下是调整/更正问题的答案
UsageStart and UsageEnd columns are timestamps, and have millisecond precision (which means that times can be 1ms apart from each other).
ResourceId and UserId are strings with no pattern (but guaranteed to be unique for each resource and user, respectively).
ResourceTotalCost and WeightedCost are both float numbers.
我不想破坏我以前的答案,因为它仍然是正确的(并且涵盖具有离散间隔的场景,例如类似的),所以仍然希望有人会发现它非常有用
所以,现在 - 新的查询(当然还是BigQuery Standard SQL)
WITH Usage AS (
SELECT ResourceId, UserId, UsageStart, UsageEnd AS UsageEnd, ResourceTotalCost
FROM (
SELECT 'res3' AS ResourceId, 'u1' AS UserId, TIMESTAMP '2016-01-01 01:00:00.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:02.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res3' AS ResourceId, 'u1' AS UserId, TIMESTAMP '2016-01-01 01:00:04.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:08.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res3' AS ResourceId, 'u2' AS UserId, TIMESTAMP '2016-01-01 01:00:04.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:09.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res3' AS ResourceId, 'u3' AS UserId, TIMESTAMP '2016-01-01 01:00:01.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:05.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res3' AS ResourceId, 'u3' AS UserId, TIMESTAMP '2016-01-01 01:00:08.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:09.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res3' AS ResourceId, NULL AS UserId, TIMESTAMP '2016-01-01 01:00:00.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:10.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res1' AS ResourceId, 'u1' AS UserId, TIMESTAMP '2016-01-01 01:00:00.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:04.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res1' AS ResourceId, 'u2' AS UserId, TIMESTAMP '2016-01-01 01:00:04.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:09.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res1' AS ResourceId, NULL AS UserId, TIMESTAMP '2016-01-01 01:00:00.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:10.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res2' AS ResourceId, 'u1' AS UserId, TIMESTAMP '2016-01-01 01:00:00.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:06.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res2' AS ResourceId, 'u2' AS UserId, TIMESTAMP '2016-01-01 01:00:04.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:09.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res2' AS ResourceId, NULL AS UserId, TIMESTAMP '2016-01-01 01:00:00.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:10.000' AS UsageEnd, 100 AS ResourceTotalCost )
), iIntervals AS (
SELECT ResourceId, iStart, LEAD(iStart)
OVER(PARTITION BY ResourceId ORDER BY iStart) AS iEnd
FROM (
SELECT DISTINCT ResourceId, iStart FROM (
SELECT ResourceId, UsageStart AS iStart FROM Usage UNION ALL
SELECT ResourceId, UsageEnd AS iStart FROM Usage )
)
), iWeights AS (
SELECT iStart, iEnd, x.ResourceId, UserId, ResourceTotalCost,
SUM(iWeight / CASE WHEN Users = 0 THEN 1 ELSE Users END / width) AS iWeight
FROM (
SELECT iStart, iEnd, TIMESTAMP_DIFF(iEnd, iStart, MILLISECOND) AS iWeight, iIntervals.ResourceId, UserId, ResourceTotalCost,
COUNT(DISTINCT UserId) OVER(PARTITION BY iIntervals.ResourceId, iStart, iEnd) AS Users
FROM iIntervals JOIN Usage
ON iIntervals.ResourceId = Usage.ResourceId
AND iStart >= UsageStart AND iEnd <= UsageEnd
WHERE iEnd IS NOT NULL ) AS x
JOIN (SELECT ResourceId, MAX(UNIX_MILLIS(iEnd)) - MIN(UNIX_MILLIS(iStart)) AS width FROM iIntervals GROUP BY 1) AS y
ON x.ResourceId = y.ResourceId WHERE NOT (UserId IS NULL AND Users > 0) GROUP BY 1, 2, 3, 4, 5
)
SELECT usage.ResourceId, usage.UserId, usage.UsageStart, usage.UsageEnd as UsageEnd,
iWeights.ResourceTotalCost, ROUND(SUM(iWeights.ResourceTotalCost * iWeight), 2) AS WeightedCost
FROM Usage JOIN iWeights
ON usage.ResourceId = iWeights.ResourceId AND usage.UserId = iWeights.UserId
AND iWeights.iStart BETWEEN usage.UsageStart AND usage.UsageEnd
AND iWeights.iEnd BETWEEN usage.UsageStart AND usage.UsageEnd
GROUP BY 1, 2, 3, 4, 5 ORDER BY 1, 2, 3
输出类似于前一个例子中的输出,因为间隔是相同的(即使现在以开始和结束显示为TIMESTAMPs)
ResourceId UserId UsageStart UsageEnd ResourceTotalCost WeightedCost
res1 u1 2016-01-01 01:00:00 UTC 2016-01-01 01:00:04 UTC 100.0 40.0
res1 u2 2016-01-01 01:00:04 UTC 2016-01-01 01:00:09 UTC 100.0 50.0
res2 u1 2016-01-01 01:00:00 UTC 2016-01-01 01:00:06 UTC 100.0 50.0
res2 u2 2016-01-01 01:00:04 UTC 2016-01-01 01:00:09 UTC 100.0 40.0
res3 u1 2016-01-01 01:00:00 UTC 2016-01-01 01:00:02 UTC 100.0 15.0
res3 u1 2016-01-01 01:00:04 UTC 2016-01-01 01:00:08 UTC 100.0 18.33
res3 u2 2016-01-01 01:00:04 UTC 2016-01-01 01:00:09 UTC 100.0 23.33
res3 u3 2016-01-01 01:00:01 UTC 2016-01-01 01:00:05 UTC 100.0 28.33
res3 u3 2016-01-01 01:00:08 UTC 2016-01-01 01:00:09 UTC 100.0 5.0