我有一张带有仓库入口和出口的桌子。我想要每天多行,然后计算每天的存储成本。 (Original Image)
+ -------- + -------------- + ---------- + ---------- + -------------- + ------------ +
| material | wasting_time_a | indate | outdate | count_material | storage_cost |
+ -------- + -------------- + ---------- + ---------- + -------------- + ------------ +
| 963651 | 5 | 2016-12-02 | 2016-12-06 | 2 | 0.04357 |
| 963651 | 6 | 2016-12-02 | 2016-12-07 | 1 | 0.02615 |
| 963651 | 7 | 2016-12-02 | 2016-12-08 | 1 | 0.0305 |
| 963651 | 11 | 2016-12-02 | 2016-12-12 | 4 | 0.1917 |
| 963651 | 12 | 2016-12-02 | 2016-12-13 | 1 | 0.05229 |
| 963651 | 13 | 2016-12-02 | 2016-12-14 | 3 | 0.1699 |
| 963651 | 14 | 2016-12-02 | 2016-12-15 | 9 | 0.5490 |
| 963651 | 15 | 2016-12-02 | 2016-12-16 | 7 | 0.4575 |
| 963651 | 16 | 2016-12-02 | 2016-12-17 | 2 | 0.1394 |
| 963651 | 18 | 2016-12-02 | 2016-12-19 | 5 | 0.3922 |
| 963651 | 19 | 2016-12-02 | 2016-12-20 | 6 | 0.4968 |
| 963651 | 20 | 2016-12-02 | 2016-12-21 | 6 | 0.5229 |
| 963651 | 21 | 2016-12-02 | 2016-12-22 | 2 | 0.1830 |
| 963651 | 22 | 2016-12-02 | 2016-12-23 | 1 | 0.0959 |
| 963651 | 2 | 2016-12-22 | 2016-12-23 | 2 | 0.01743 |
| 963651 | 9 | 2016-12-22 | 2016-12-30 | 3 | 0.1177 |
| 963651 | 10 | 2016-12-22 | 2016-12-31 | 1 | 0.04357 |
| 963651 | 12 | 2016-12-22 | 2017-01-02 | 1 | 0.05229 |
| 963651 | 14 | 2016-12-22 | 2017-01-04 | 2 | 0.1220 |
+ -------- + -------------- + ---------- + ---------- + -------------- + ------------ +
Waiting_time_a只是indate和outdate之间的差异。 Count_material是在过期时从仓库中取出的材料的数量。
存储成本每天都在增加,所以我需要专门计算每一天。我的想法是以这种方式计算每一天:我从每个行的indate到outdd的多行,最后我将总结它。根据我的表,2016-12-02的存储成本将由count_material group的SUM表示。但我不知道如何计算2016-12-03。
我正在使用Impala,但每个SQL都会有所帮助:)
表格如下:
+ ---------- + ------------------ +
| DATE | total_storage_cost |
+ ---------- + ------------------ +
| 2016-12-02 | 40 |
| 2016-12-03 | 47 |
| 2016-02-04 | ... |
| 2016-02-05 | ... |
| 2016-02-06 | ... |
+ ---------- + ------------------ +
感谢您的帮助
答案 0 :(得分:1)
这可能是一个开始。但是你想如何计算表中没有的日子呢?也许你可以进一步说明一点?
DECLARE @D TABLE(mDate DATE)
INSERT INTO @D VALUES ('20161201'),('20161202'),('20161203'),('20161204'),('20161205'),('20161206'),('20161207'),('20161208'),
('20161209'),('20161210'),('20161211'),('20161212'),('20161213'),('20161214'),('20161215'),('20161216'),
('20161217'),('20161218'),('20161219'),('20161220'),('20161221'),('20161222'),('20161223'),('20161224')
DECLARE @T TABLE(Material INT
,Waiting_Time_a INT
,Indate DATE
,Outdate DATE
,Count_Material INT
,Storage_cost DECIMAL(18,10))
INSERT INTO @T VALUES
(963651,5,'20161202','20161206',2,0.0435749999),
(963651,6,'20161202','20161207',1,0.026145),
(963651,7,'20161202','20161208',1,0.0305025000),
(963651,11,'20161202','20161212',4,0.19173),
(963651,12,'20161202','20161213',1,0.05229),
(963651,13,'20161202','20161214',3,0.1699425),
(963651,14,'20161202','20161215',9,0.5490449999),
(963651,15,'20161202','20161216',7,0.4575375),
(963651,16,'20161202','20161217',2,0.13944),
(963651,18,'20161202','20161219',5,0.3921750000),
(963651,19,'20161202','20161220',6,0.4967549999),
(963651,20,'20161202','20161221',6,0.522899999),
(963651,21,'20161202','20161222',2,0.183015),
(963651,22,'20161202','20161223',1,0.095865),
(963651,2,'20161222','20161223',2,0.01743),
(963651,9,'20161222','20161230',3,0.1176525000),
(963651,10,'20161222','20161231',1,0.04357499999),
(963651,12,'20161222','20170102',1,0.05229),
(963651,14,'20161222','20170104',2,0.1220100000)
SELECT d.mDate, t.dCost
FROM @D AS d
LEFT OUTER JOIN
(SELECT Indate, SUM(Storage_cost / Waiting_Time_a) AS dCost
FROM @T
GROUP BY Indate) AS t
ON t.Indate = d.mDate
答案 1 :(得分:0)
对于某些RDBMS,这可以使用recursive CTE完成,但Impala不支持它们
对于您的情况,我建议创建一些仅包含单列的表,包含每日增量的日期(2016-12-02,2016-12-03,2016-12-04,...)。
如果您不想编码,可以通过MS Excel完成,然后导出到csv,然后导入到hdfs。
然后,您可以在表连接和日期范围过滤中使用此表。
答案 2 :(得分:0)
@全部。我一直在关注这些Hadoop讨论几周。我在SO阅读了超过1,000篇帖子。我也在阅读几本关于Hadoop的书。这个特定的帖子让我觉得Hadoop就像SQL Server一样。我并没有真正看到SQL Server和Hadoop之间存在很大差异。虽然,正如我上面提到的,我对整个Hadoop概念都很陌生。我已经使用SQL Server近10年了。如果我已经非常了解SQL Server,那么像我这样的人学习Hadoop有很大的优势吗?我只是对此感到好奇。
感谢。