考虑并发使用,在用户之间分配资源成本

时间:2016-10-21 13:47:22

标签: google-bigquery

问题

我有以下情况:给定资源,在确定的时间段内可用,在此期间花费固定金额。我有用户可以在此期间访问该资源。我需要在访问它的用户之间分配资源的成本,考虑到用户在他不访问它时不能被收费。像这样:

example 1

红色条表示整个时间内资源的可用性。蓝色和绿色条表示各个用户访问资源的时间。请注意,在时间9,没有人访问该资源,因此没有人收费。考虑到整个时间内的资源成本$ 100,将向用户1收取$ 40和用户2 $ 50的费用。 $ 10会丢失。

这个想法很简单:占用资源的全部成本并除以每个用户使用的时间。但是当我们同时使用资源时问题就出现了:

example 2

在这种情况下,在时间4和5,两个用户都使用相同的资源。在这种情况下,我需要将重叠时间的成本除以2(并发用户数),给出正确的值。

换句话说:我使用资源的用户越多,每个用户就越便宜。

当然问题会变得更加复杂,例如:

example 3

数据

目前我有一个具有以下结构的表(使用示例3):

+---------------------------------------------------------------------------
|ResourceId |UserId |UsageStart |UsageEnd |ResourceTotalCost |WeightedCost |
+--------------------------------------------------------------------------+
|res1       |u1     |time 0     |time 1   |100               |20           |
|res1       |u1     |time 4     |time 7   |100               |40           |
|res1       |u2     |time 4     |time 8   |100               |50           |
|res1       |u3     |time 1     |time 4   |100               |40           |
|res1       |u3     |time 8     |time 8   |100               |10           |
|---------------------------------------------------------------------------

我有每个用户的确切使用期限,加上资源的总成本(整个分析期间),以及用户的资源加权成本(这是我想要的列)提高)。

UsageStartUsageEnd列是时间戳,具有毫秒精度(这意味着时间可以相隔1ms)。 ResourceIdUserId是没有模式的字符串(但保证每个资源和用户都是唯一的)。 ResourceTotalCostWeightedCost都是浮点数。

输出

我需要的输出与我已经拥有的输出相同,但考虑到用户之间资源的并发使用,加权成本。对于示例3,这是预期的输出:

+---------------------------------------------------------------------------
|ResourceId |UserId |UsageStart |UsageEnd |ResourceTotalCost |WeightedCost |
+--------------------------------------------------------------------------+
|res1       |u1     |time 0     |time 1   |100               |15           |
|res1       |u1     |time 4     |time 7   |100               |18.33        |
|res1       |u2     |time 4     |time 8   |100               |23.33        |
|res1       |u3     |time 1     |time 4   |100               |28.33        |
|res1       |u3     |time 8     |time 8   |100               |5            |
|---------------------------------------------------------------------------

那么,关于如何解决这个问题的任何想法?我考虑使用UDF来解决这个问题,但是目前UDF的使用强加了我在项目中无法承受的几个限制(例如6个UDF查询),因此最好使用纯BigQuery SQL。 / p>

感谢。

3 个答案:

答案 0 :(得分:3)

请尝试以下内容 - 适用于BigQuery Standard SQL(请参阅Enabling Standard SQLMigrating from legacy SQL

如您所见 - 我稍微调整您的数据(UsageStart和UsageEnd是整数)
我将所有三个示例分别标记为res1,res2和res3 for ResourceId
此外,我还为每个资源添加了额外的条目以显示资源可用性。这些条目的UserId为NULL

所以查询是

WITH Usage AS (
  SELECT ResourceId, UserId, UsageStart, UsageEnd + 1 AS UsageEnd, ResourceTotalCost 
  FROM (SELECT 'res3' AS ResourceId, 'u1' AS UserId, 0 AS UsageStart, 1 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res3' AS ResourceId, 'u1' AS UserId, 4 AS UsageStart, 7 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res3' AS ResourceId, 'u2' AS UserId, 4 AS UsageStart, 8 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res3' AS ResourceId, 'u3' AS UserId, 1 AS UsageStart, 4 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res3' AS ResourceId, 'u3' AS UserId, 8 AS UsageStart, 8 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res3' AS ResourceId, NULL AS UserId, 0 AS UsageStart, 9 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res1' AS ResourceId, 'u1' AS UserId, 0 AS UsageStart, 3 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res1' AS ResourceId, 'u2' AS UserId, 4 AS UsageStart, 8 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res1' AS ResourceId, NULL AS UserId, 0 AS UsageStart, 9 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res2' AS ResourceId, 'u1' AS UserId, 0 AS UsageStart, 5 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res2' AS ResourceId, 'u2' AS UserId, 4 AS UsageStart, 8 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res2' AS ResourceId, NULL AS UserId, 0 AS UsageStart, 9 AS UsageEnd, 100 AS ResourceTotalCost )
), iIntervals AS (
  SELECT ResourceId, iStart, LEAD(iStart) 
    OVER(PARTITION BY ResourceId ORDER BY iStart) AS iEnd
  FROM (
    SELECT DISTINCT ResourceId, iStart FROM (
      SELECT ResourceId, UsageStart AS iStart FROM Usage UNION ALL 
      SELECT ResourceId, UsageEnd AS iStart FROM Usage )
  )
), iWeights AS (
  SELECT iStart, iEnd, x.ResourceId, UserId, ResourceTotalCost, 
    SUM(iWeight / CASE WHEN Users = 0 THEN 1 ELSE Users END / width) AS iWeight
  FROM (
    SELECT iStart, iEnd, iEnd - iStart AS iWeight, iIntervals.ResourceId, UserId, ResourceTotalCost, 
      COUNT(DISTINCT UserId) OVER(PARTITION BY iIntervals.ResourceId, iStart, iEnd) AS Users 
    FROM iIntervals JOIN Usage
    ON iIntervals.ResourceId = Usage.ResourceId
    AND iStart >= UsageStart AND iEnd <= UsageEnd 
    WHERE iEnd IS NOT NULL ) AS x
  JOIN (SELECT ResourceId, MAX(iEnd) - MIN(iStart) AS width FROM iIntervals GROUP BY 1) AS y
  ON x.ResourceId = y.ResourceId WHERE NOT (UserId IS NULL AND Users > 0) GROUP BY 1, 2, 3, 4, 5
)
SELECT usage.ResourceId, usage.UserId, usage.UsageStart, usage.UsageEnd - 1 as UsageEnd, 
  iWeights.ResourceTotalCost, ROUND(SUM(iWeights.ResourceTotalCost * iWeight), 2) AS WeightedCost 
FROM Usage JOIN iWeights 
ON usage.ResourceId = iWeights.ResourceId AND usage.UserId = iWeights.UserId
AND iWeights.iStart BETWEEN usage.UsageStart AND usage.UsageEnd
AND iWeights.iEnd BETWEEN usage.UsageStart AND usage.UsageEnd
GROUP BY 1, 2, 3, 4, 5 ORDER BY 1, 2, 3

以下输出,这是我认为的预期

ResourceId  UserId  UsageStart  UsageEnd    ResourceTotalCost   WeightedCost     
res1            u1           0         3                100          40.0    
res1            u2           4         8                100          50.0    
res2            u1           0         5                100          50.0    
res2            u2           4         8                100          40.0    
res3            u1           0         1                100          15.0    
res3            u1           4         7                100          18.33   
res3            u2           4         8                100          23.33   
res3            u3           1         4                100          28.33   
res3            u3           8         8                100           5.0    

您的UsageStart和UsageEnd很可能不是整数,因此您需要分别调整上述解决方案。但我的重点是为您提供合理的逻辑示例。

希望这有帮助!并且它可能也可能被优化

答案 1 :(得分:1)

请原谅我使用MySQL程序而不是BitQuery函数回答这个问题,但它应该是有用的。 我不得不从你的模型中做出一些假设:

  • 有问题的资源1用于所有时间段,在您的示例中,10个时隙(每次10美元),即使您的数据仅使用9
  • 可以合理地假设您可以在所有可能的时间段上进行O(n)查询
  • 您的UsageTable有一个ID主键列

您可以使用1个选择和1个更新查询来执行循环:

CREATE PROCEDURE `calculateUsage`()
BEGIN

set @slot = 0;
repeat
set @increase = 
(
select slotcost/count(id) as usercost
from (
select id, ResourceTotalCost/10 as slotcost
from UsageTable
where @slot between UsageStart and UsageEnd
) as x
);

update UsageTable
set WeightedCost = WeightedCost + @increase
where @slot between UsageStart and UsageEnd;

set @slot = @slot + 1;
until @slot = 10 end repeat;

END

运行该查询将表更新为:

ID  RID UID St  End RTC     WC
1   1   1   0   1   100.00  15.00
2   1   1   4   7   100.00  18.33
3   1   2   4   8   100.00  23.33
4   1   3   1   4   100.00  28.33
5   1   3   8   8   100.00  5.00

答案 2 :(得分:1)

以下是调整/更正问题的答案

UsageStart and UsageEnd columns are timestamps, and have millisecond precision (which means that times can be 1ms apart from each other).   
ResourceId and UserId are strings with no pattern (but guaranteed to be unique for each resource and user, respectively).   
ResourceTotalCost and WeightedCost are both float numbers.

我不想破坏我以前的答案,因为它仍然是正确的(并且涵盖具有离散间隔的场景,例如类似的),所以仍然希望有人会发现它非常有用

所以,现在 - 新的查询(当然还是BigQuery Standard SQL)

WITH Usage AS (
  SELECT ResourceId, UserId, UsageStart, UsageEnd AS UsageEnd, ResourceTotalCost 
  FROM (
    SELECT 'res3' AS ResourceId, 'u1' AS UserId, TIMESTAMP '2016-01-01 01:00:00.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:02.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res3' AS ResourceId, 'u1' AS UserId, TIMESTAMP '2016-01-01 01:00:04.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:08.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res3' AS ResourceId, 'u2' AS UserId, TIMESTAMP '2016-01-01 01:00:04.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:09.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res3' AS ResourceId, 'u3' AS UserId, TIMESTAMP '2016-01-01 01:00:01.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:05.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res3' AS ResourceId, 'u3' AS UserId, TIMESTAMP '2016-01-01 01:00:08.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:09.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res3' AS ResourceId, NULL AS UserId, TIMESTAMP '2016-01-01 01:00:00.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:10.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res1' AS ResourceId, 'u1' AS UserId, TIMESTAMP '2016-01-01 01:00:00.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:04.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res1' AS ResourceId, 'u2' AS UserId, TIMESTAMP '2016-01-01 01:00:04.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:09.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res1' AS ResourceId, NULL AS UserId, TIMESTAMP '2016-01-01 01:00:00.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:10.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res2' AS ResourceId, 'u1' AS UserId, TIMESTAMP '2016-01-01 01:00:00.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:06.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res2' AS ResourceId, 'u2' AS UserId, TIMESTAMP '2016-01-01 01:00:04.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:09.000' AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
    SELECT 'res2' AS ResourceId, NULL AS UserId, TIMESTAMP '2016-01-01 01:00:00.000' AS UsageStart, TIMESTAMP '2016-01-01 01:00:10.000' AS UsageEnd, 100 AS ResourceTotalCost )
), iIntervals AS (
  SELECT ResourceId, iStart, LEAD(iStart) 
    OVER(PARTITION BY ResourceId ORDER BY iStart) AS iEnd
  FROM (
    SELECT DISTINCT ResourceId, iStart FROM (
      SELECT ResourceId, UsageStart AS iStart FROM Usage UNION ALL 
      SELECT ResourceId, UsageEnd AS iStart FROM Usage )
  )
), iWeights AS (
  SELECT iStart, iEnd, x.ResourceId, UserId, ResourceTotalCost, 
    SUM(iWeight / CASE WHEN Users = 0 THEN 1 ELSE Users END / width) AS iWeight
  FROM (
    SELECT iStart, iEnd, TIMESTAMP_DIFF(iEnd, iStart, MILLISECOND) AS iWeight, iIntervals.ResourceId, UserId, ResourceTotalCost, 
      COUNT(DISTINCT UserId) OVER(PARTITION BY iIntervals.ResourceId, iStart, iEnd) AS Users 
    FROM iIntervals JOIN Usage
    ON iIntervals.ResourceId = Usage.ResourceId
    AND iStart >= UsageStart AND iEnd <= UsageEnd 
    WHERE iEnd IS NOT NULL ) AS x
  JOIN (SELECT ResourceId, MAX(UNIX_MILLIS(iEnd)) - MIN(UNIX_MILLIS(iStart)) AS width FROM iIntervals GROUP BY 1) AS y
  ON x.ResourceId = y.ResourceId WHERE NOT (UserId IS NULL AND Users > 0) GROUP BY 1, 2, 3, 4, 5
)
SELECT usage.ResourceId, usage.UserId, usage.UsageStart, usage.UsageEnd as UsageEnd, 
  iWeights.ResourceTotalCost, ROUND(SUM(iWeights.ResourceTotalCost * iWeight), 2) AS WeightedCost 
FROM Usage JOIN iWeights 
ON usage.ResourceId = iWeights.ResourceId AND usage.UserId = iWeights.UserId
AND iWeights.iStart BETWEEN usage.UsageStart AND usage.UsageEnd
AND iWeights.iEnd BETWEEN usage.UsageStart AND usage.UsageEnd
GROUP BY 1, 2, 3, 4, 5 ORDER BY 1, 2, 3

输出类似于前一个例子中的输出,因为间隔是相同的(即使现在以开始和结束显示为TIMESTAMPs)

ResourceId UserId UsageStart                 UsageEnd         ResourceTotalCost WeightedCost     
      res1    u1  2016-01-01 01:00:00 UTC  2016-01-01 01:00:04 UTC         100.0    40.0     
      res1    u2  2016-01-01 01:00:04 UTC  2016-01-01 01:00:09 UTC         100.0    50.0     
      res2    u1  2016-01-01 01:00:00 UTC  2016-01-01 01:00:06 UTC         100.0    50.0     
      res2    u2  2016-01-01 01:00:04 UTC  2016-01-01 01:00:09 UTC         100.0    40.0     
      res3    u1  2016-01-01 01:00:00 UTC  2016-01-01 01:00:02 UTC         100.0    15.0     
      res3    u1  2016-01-01 01:00:04 UTC  2016-01-01 01:00:08 UTC         100.0    18.33    
      res3    u2  2016-01-01 01:00:04 UTC  2016-01-01 01:00:09 UTC         100.0    23.33    
      res3    u3  2016-01-01 01:00:01 UTC  2016-01-01 01:00:05 UTC         100.0    28.33    
      res3    u3  2016-01-01 01:00:08 UTC  2016-01-01 01:00:09 UTC         100.0     5.0