如何仅使用rdd操作计算PostgreSQL查询

时间:2019-06-25 14:47:58

标签: pyspark-sql

所以我的任务是仅使用RDD操作来实现以下postgresql查询。这是查询:

查询1:

SELECT DISTINCT c.name, count(p.pid)FROM clubs c
JOIN teams t on c.cid = t.cid
JOIN tournaments d on t.tid = t.tid
JOIN players p on p.ncid = c.ncid
WHERE c.cid = 45 AND d.tyear = 2014
GROUP BY c.name
ORDER BY count DESC

查询2:

SELECT DISTINCT t.tyear, c.name, (SELECT max(m.matchdate) -      min(m.matchdate) FROM matches m WHERE t.tyear = date_part('year', m.matchdate)) AS days  FROM tournaments t
JOIN hosts h ON t.tyear = h.tyear
JOIN countries c on c.cid = h.cid
JOIN stadiums s on s.cid = c.cid
JOIN matches m on m.sid = s.sid
GROUP BY t.tyear, c.name, s.sid
ORDER BY days DESC

有人知道如何仅使用RDD操作来计算这些查询吗? 任何帮助将不胜感激。

0 个答案:

没有答案