我正在使用PySpark和SQL Context,它允许您在框架内编写SQL查询。由于某种原因,这个命令不起作用,我不确定原因。
complaint_by_city = sqlContext.sql('SELECT City, COUNT(*) as `city_comp` '
'FROM c311 '
'GROUP BY City '
'COLLATE NOCASE '
'ORDER BY -city_comp '
'LIMIT 21 ')
编辑它给我的错误是
ParseException: u"\nmismatched input 'COLLATE' expecting {<EOF>, ',', '.', '[', 'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, 'IS', 'ASC', 'DESC', 'WINDOW', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, '+', '-', '*', '/', '%', 'DIV', '&', '|', '^', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 81)\n\n== SQL ==\nSELECT City, COUNT(*) as `city_comp` FROM c311 GROUP BY City ORDER BY -city_comp COLLATE NOCASELIMIT 21 \n---------------------------------------------------------------------------------^^^\n"
答案 0 :(得分:3)
我可以建议:
SELECT LOWER(City) as City, COUNT(*) as city_comp
FROM c311
GROUP BY LOWER(City)
ORDER BY city_comp DESC
LIMIT 21;