pySpark.sql如何使用WHERE关键字?

时间:2019-02-07 19:04:17

标签: apache-spark pyspark pyspark-sql

您如何使用WHERE关键字来获得在泰坦尼克号灾难中幸存下来的性别及其百分比?

我的代码:

spark.sql(
    "SELECT Sex Where Survived=1 ,count(Sex) \
    as gender_count,count(sex)*100/sum(count(sex)) over() \
    as percent from titanic_table GROUP BY sex"
).show()

错误:

ParseException: "
mismatched input ',' expecting <EOF>(line 1, pos 28)
== SQL ==
SELECT Sex Where Survived=1 ,count(Sex) 
as gender_count,count(sex)*100/sum(count(sex)) over() 
as percent from titanic_table GROUP BY sex
----------------------------^^^
"

1 个答案:

答案 0 :(得分:0)

您应该将其放在FROM之后和GROUP BY之前。

您的代码应为:

spark.sql("SELECT Sex, count(Sex) AS gender_count, \
100*count(sex)/sum(count(sex)) over() AS percent \
FROM titanic_table \
WHERE Survived = 1 \
GROUP BY sex").show()