计算当前日期余额为正的卡数

时间:2019-02-14 08:46:00

标签: sql hive

我有下一张桌子:

df['new_text'] = df['new_text'].fillna(df['text'])
    id  text    new_text
0   1   T7MS1   1
1   2   T5HS2   2
2   3   T3XP1   1
3   4   Tank_3  3
4   5   TANK 5  5
5   6   System  System

在该表中,我们可以看到一个客户有两张银行卡。余额是当前时间的卡余额。 我想在当前时刻获取余额为正的卡数。 我想在结果中看到什么

|client| card| date               | balance|
--------------------------------------------
|  1   | 123 | 10-01-2018 10:04:36|   1000 |
|  1   | 321 | 10-01-2018 10:07:28|   2980 |
|  1   | 321 | 10-01-2018 11:23:34|  -100  |
|  1   | 123 | 10-01-2018 12:32:33|  -200  |
|  1   | 123 | 10-01-2018 12:44:43|   100  |
|  1   | 321 | 10-01-2018 14:00:28|   2000 |
|  1   | 321 | 10-01-2018 14:00:28|  -2100 |

如何计算余额为正的卡数?(属性bal_pos)

P.S我真的不知道当两张卡的日期相同时(表的最后两行),如何计算余额为正的卡数。 P.P.S同样适用于下一个示例:

|client| card| date               | balance| bal_pos|
-----------------------------------------------------
|  1   | 123 | 10-01-2018 10:04:36|   1000 |   1    |
|  1   | 321 | 10-01-2018 10:07:28|   2980 |   2    |
|  1   | 321 | 10-01-2018 11:23:34|  -100  |   1    |
|  1   | 123 | 10-01-2018 12:32:33|  -200  |   0    |
|  1   | 123 | 10-01-2018 12:44:43|   100  |   1    |
|  1   | 321 | 10-01-2018 14:00:28|   2000 |   2    |
|  1   | 321 | 10-01-2018 14:00:28|  -2100 |   1    |

在这种情况下,我期望下一个结果(因为客户在该表中只有2张卡,且余额为正的卡数不能超过2张):

|client| card| date               | balance|
--------------------------------------------
|  1   | 123 | 10-01-2018 10:04:36|   1000 |
|  1   | 321 | 10-01-2018 10:07:28|   2980 |
|  1   | 321 | 10-01-2018 11:23:34|   100  |
|  1   | 123 | 10-01-2018 12:32:33|   200  |
|  1   | 123 | 10-01-2018 12:44:43|   100  |
|  1   | 321 | 10-01-2018 14:00:28|   2000 |
|  1   | 321 | 10-01-2018 14:00:28|   2100 |

1 个答案:

答案 0 :(得分:1)

您可以尝试以下查询吗?

由于即使时间戳记也可以相同,所以您可以在解析函数上定义window子句(我采用int col作为排序依据,而不是时间戳)

with t1 as (select 1 as client, 123 as card, 1 as orderBy, 1000 as bal 
union  
select 1 as client, 321 as card, 2 as orderBy, 2980 as bal 
union  
select 1 as client, 321 as card, 3 as orderBy, -100 as bal 
union  
select 1 as client, 123 as card, 4 as orderBy, -200 as bal 
union  
select 1 as client, 123 as card, 5 as orderBy, 100 as bal 
union  
select 1 as client, 321 as card, 6 as orderBy, 2000 as bal 
union  
select 1 as client, 321 as card, 6 as orderBy, -2100 as bal)
,res1 as (select client, card, orderBy, bal, case when bal>0 then 1 else -1 end as bal_type from t1)
select client, card, orderBy, bal, sum(bal_type) 
over (order by OrderBy asc, bal desc -- this to get output same as yours, but you will never be sure which bal you have to consider if time is same
rows between unbounded preceding and current row) as bal_pos from res1;

结果-

1       123     1       1000    1
1       321     2       2980    2
1       321     3       -100    1
1       123     4       -200    0
1       123     5       100     1
1       321     6       2000    2
1       321     6       -2100   1

如果不指定window子句而不是为每一行计算总和,它将计算超出范围,请检查以下查询的结果。

查询-

with t1 as (select 1 as client, 123 as card, 1 as orderBy, 1000 as bal 
union  
select 1 as client, 321 as card, 2 as orderBy, 2980 as bal 
union  
select 1 as client, 321 as card, 3 as orderBy, -100 as bal 
union  
select 1 as client, 123 as card, 4 as orderBy, -200 as bal 
union  
select 1 as client, 123 as card, 5 as orderBy, 100 as bal 
union  
select 1 as client, 321 as card, 6 as orderBy, 2000 as bal 
union  
select 1 as client, 321 as card, 6 as orderBy, -2100 as bal)
,res1 as (select client, card, orderBy, bal, case when bal>0 then 1 else -1 end as bal_type from t1)
select client, card, orderBy, bal, sum(bal_type) over (order by OrderBy 
-- range between unbounded preceding and current row -- check the results with range clause
) as bal_pos from res1;

1       123     1       1000    1
1       321     2       2980    2
1       321     3       -100    1
1       123     4       -200    0
1       123     5       100     1
1       321     6       -2100   1 -- sum of first row till current row but based on value of orderBy column (6)
1       321     6       2000    1

希望这会有所帮助