在beamSQL中实现ROW_NUMBER()

时间:2018-06-06 15:47:03

标签: apache-beam

我有以下查询:

SELECT DISTINCT Summed, ROW_NUMBER () OVER (order by Summed desc) as Rank  from table1

我必须在Apache Beam(beamSql)中编写它。以下是我的代码:

PCollection<BeamRecord> rec_2_part2 = rec_2.apply(BeamSql.query("SELECT DISTINCT Summed, ROW_NUMBER(Summed) OVER (ORDER BY Summed) Rank1 from PCOLLECTION "));

但我收到以下错误:

  

引起:java.lang.UnsupportedOperationException:运算符:ROW_NUMBER尚不支持!

知道如何在beamSql中实现ROW_NUMBER()吗?

2 个答案:

答案 0 :(得分:1)

以下是一种不使用let json: [String: Any] = ["from": "Mailgun Sandbox <(postmaster@sandbox(mySandBoxAddressKey).mailgun.org)>", "to": "(myPersonalMailAddress)@gmail.com", "subject": "test", "text": "Nochmal test"] let jsonData = try? JSONSerialization.data(withJSONObject: json) request.httpBody = jsonData 即可近似当前查询的方法:

ROW_NUMBER

基本思想是首先使用子查询来获取仅具有不同SELECT t1.Summed, (SELECT COUNT(*) FROM (SELECT DISTINCT Summed FROM table1) t2 WHERE t2.Summed >= t1.Summed) AS Rank FROM ( SELECT DISTINCT Summed FROM table1 ) t1 值的表。然后,使用相关子查询来模拟行号。这不是一种非常有效的方法,但如果Summed不可用,那么你就会遇到一些替代方法。

答案 1 :(得分:0)

适用于上述查询的解决方案:

PCollection<BeamRecord> rec_2 = rec_1.apply(BeamSql.query("SELECT max(Summed) as maxed, max(Summed)-10 as least, 'a' as Dummy from PCOLLECTION"));