Question

我正在使用apache zeppelin版本0.6。我有以下配置单元查询

从student_withdraw中选择certificate_name，count（*）

现在我希望有一个where子句，它作为选择列表表示给最终用户。内部查询如下所示

从student_withdraw中选择certificate_name，count（*），其中讲义为（从student_withdraw中选择distinct lecturer）

现在有一个选择列表的默认表示法是“$ {item = A，A | B | C}”

我试着像下面那样

％JDBC（蜂房） select student_name，count（*）from student_withdraw其中lecturer_name =“$ {item = Null，从student_withdraw中选择distinct lecturer_name}”group by certificate_name

但无法在选择列表中获取不同的讲师。所有在选择列表中显示的是查询。

如何为选择列表选择不同的讲座？

谢谢

Answer 1

假设

您的方案涉及 Zeppelin 上的动态表单。我同意你的逻辑，但动态表单不会执行任何SQL或HiveQL，然后将结果作为选项传递到页面上，只是你输入的内容。我假设您安装 Zeppelin 包括所有解释器，该表是 Hive 上的托管本机表，并且必须为最终用户选择讲师。

解决方案

如果唯一讲师姓名的数量不多，例如10以下，只需在查询的select form中手动输入。

SELECT certificate_name, COUNT(*)
FROM student_withdraw
WHERE lecturer_name = ${item=nameA, nameA|nameB|nameC}
GROUP BY certificate_name

否则，您可以考虑首先编写整个讲师姓名的字符串，然后将结果复制并粘贴到查询的select form中。如下所示：

%pyspark
from pyspark.sql import HiveContext
hc = HiveContext(sc)
student_withdraw = hc.table("student_withdraw")
student_withdraw.registerTempTable("student_withdraw")
lecturer_list = student_withdraw.sql('SELECT DISTINCT lecturer_name FROM student_withdraw').rdd.map(r => r(0)).collect()
lecturer_names = '|'.join(lecturer_list)
print(lecturer_names)

%jdbc(hive)
SELECT certificate_name, COUNT(*)
FROM student_withdraw
/*the second argument in the select form is copied from the result of the previous execution*/
WHERE lecturer_name = ${item=nameA, nameA|nameB....|nameY|nameZ}
GROUP BY certificate_name

根据apache zeppelin中的select查询选择列表

1 个答案:

假设

解决方案