如何在Spark sql上执行分组操作。我无法在一个特定的列上进行分组,而无法在各个列上汇总值
大家好, 我想在Spark SQL中的一个内部查询之上执行分组操作。以下是我要实现的操作。
val sqlResultjoin_groupInfo =
spark.sql("select sum(values) as volume,
sum(disNumber) as disNumber,values
from (select *
from dist a
join map b on a.id=b.id ) abc
group by values")
我遇到以下错误
org.apache.spark.sql.catalyst.parser.ParseException: 多余的输入')'期望{'(','SELECT','FROM','ADD','AS','ALL','DISTINCT','WHERE','GROUP','BY','GROUPING' ,'SETS','CUBE','ROLLUP','ORDER','HAVING','LIMIT','AT','OR','AND','IN',NOT,'NO','EXISTS' ,'BETWEEN','LIKE',RLIKE,'IS','NULL','TRUE','FALSE','NULLS','ASC','DESC','FOR','INTERVAL','CASE' ,'WHEN','THEN','ELSE','END','JOIN','CROSS','OUTER','INNER','LEFT','SEMI','RIGHT','FULL','自然'',``开启'',``横向'',``窗口'',``上方'',``部分'',``范围'',``行'',``无界'',``前行'',``跟随'',``当前'',``第一'' ,“ AFTER”,“ LAST”,“ ROW”,“ WITH”,“ VALUES”,“ CREATE”,“ TABLE”,“ DIRECTORY”,“ VIEW”,“ REPLACE”,“ INSERT”,“ DELETE”,“ INTO”,“ DESCRIBE”,“ EXPLAIN”,“ FORMAT”,“ LOGICAL”,“ CODEGEN”,“ COST”,“ CAST”,“ SHOW”,“ TABLES”,“ COLUMNS”,“ COLUMN”,“ USE” ,“ PARTITIONS”,“ FUNCTIONS”,“ DROP”,“ UNION”,“ EXCEPT”,“ MINUS”,“ INTERSECT”,“ TO”,“ TABLESAMPLE”,“ STRATIFY”,“ ALTER”,“ RENAME”,“ ARRAY”,“ MAP”,“ STRUCT”,“ COMMENT”,“ SET”,“ RESET”,“ DATA”,“ START”, '交易','提交','回滚','宏','IGNORE','BOTH','LEADING','TRAILING','IF','POSITION','+','-','* ','DIV','〜','PERCENT','BUCKET','OUT','OF','SORT','CLUSTER','DISTRIBUTE','OVERWRITE','TRANSFORM','REDUCE', 'SERDE','SERDEPROPERTIES','RECORDREADER','RECORDWRITER','DELIMITED','FIELDS','TERMINATED','COLLECTION','ITEMS','KEY','ESCAPED','LINES','SEPARATED ','FUNCTION','EXTENDED','REFRESH','CLEAR','CACHE','UNCACHE','LAZY','FORMATTED','GLOBAL',TEMPORARY,'OPTIONS','UNSET','TBLPROPERTIES ”,“ DBPROPERTIES”,“ BUCKETS”,“ SKEWED”,“ STORED”,“ DIRECTORIES”,“ LOCATION”,“ EXCHANGE”,“ ARCHIVE”,“ UNARCHIVE”,“ FILEFORMAT”,“ TOUCH”,“ COMPACT”, 'CONCATENATE','CHANGE','CASCADE','RESTRICT','Clustered','SORTED','PURGE','INPUTFORMAT','OUTPUTFORMAT',DATABASE,DATABASE,'DFS','TRUNCATE','ANALYZE ”,“计算机”,“列表”,“统计信息”,“已分配”,“外部”,“已定义”,“撤销”,“ GRANT”,“ LOCK”,“ UNLOCK”,“ MSCK”,“ REPAIR”, 'RECOVER','EXPORT','IMPORT ','LOAD','ROLE','ROLES','COMPACTIONS','PRINCIPALS','TRANSACTIONS','INDEX','INDEXES','LOCKS','OPTION','ANTI','LOCAL', 'INPATH',STRING,BIGINT_LITERAL,SMALLINT_LITERAL,TINYINT_LITERAL,INTEGER_VALUE,DECIMAL_VALUE,DOUBLE_LITERAL,BIGDECIMAL_LITERAL,IDENTIFIER,BACKQUOTED_IDENTIFIER}(第1行,pos 53)
== SQL ==
如果我仅对一列执行上述操作,即会得到结果
val sqlResultjoin_groupInfo= spark.sql("select sum(values) as volume,values from ( select * from dist a join map b on a.id=b.id ) abc group by values")
有人可以帮助我,如何在Spark sql上进行分组?
答案 0 :(得分:1)
通过alias.col引用所有外部查询项。例如
abc.values
您正在使用嵌入式视图。即abc。