如何在Spark sql上执行分组和聚合操作

时间:2019-01-08 19:48:26

标签: apache-spark dataframe apache-spark-sql

如何在Spark sql上执行分组操作。我无法在一个特定的列上进行分组,而无法在各个列上汇总值

大家好, 我想在Spark SQL中的一个内部查询之上执行分组操作。以下是我要实现的操作。

  val sqlResultjoin_groupInfo =
       spark.sql("select sum(values) as volume,
                         sum(disNumber) as disNumber,values 
                    from (select *
                            from dist a
                             join map b on a.id=b.id ) abc 
                  group by values")

我遇到以下错误

  

org.apache.spark.sql.catalyst.parser.ParseException:             多余的输入')'期望{'(','SELECT','FROM','ADD','AS','ALL','DISTINCT','WHERE','GROUP','BY','GROUPING' ,'SETS','CUBE','ROLLUP','ORDER','HAVING','LIMIT','AT','OR','AND','IN',NOT,'NO','EXISTS' ,'BETWEEN','LIKE',RLIKE,'IS','NULL','TRUE','FALSE','NULLS','ASC','DESC','FOR','INTERVAL','CASE' ,'WHEN','THEN','ELSE','END','JOIN','CROSS','OUTER','INNER','LEFT','SEMI','RIGHT','FULL','自然'',``开启'',``横向'',``窗口'',``上方'',``部分'',``范围'',``行'',``无界'',``前行'',``跟随'',``当前'',``第一'' ,“ AFTER”,“ LAST”,“ ROW”,“ WITH”,“ VALUES”,“ CREATE”,“ TABLE”,“ DIRECTORY”,“ VIEW”,“ REPLACE”,“ INSERT”,“ DELETE”,“ INTO”,“ DESCRIBE”,“ EXPLAIN”,“ FORMAT”,“ LOGICAL”,“ CODEGEN”,“ COST”,“ CAST”,“ SHOW”,“ TABLES”,“ COLUMNS”,“ COLUMN”,“ USE” ,“ PARTITIONS”,“ FUNCTIONS”,“ DROP”,“ UNION”,“ EXCEPT”,“ MINUS”,“ INTERSECT”,“ TO”,“ TABLESAMPLE”,“ STRATIFY”,“ ALTER”,“ RENAME”,“ ARRAY”,“ MAP”,“ STRUCT”,“ COMMENT”,“ SET”,“ RESET”,“ DATA”,“ START”, '交易','提交','回滚','宏','IGNORE','BOTH','LEADING','TRAILING','IF','POSITION','+','-','* ','DIV','〜','PERCENT','BUCKET','OUT','OF','SORT','CLUSTER','DISTRIBUTE','OVERWRITE','TRANSFORM','REDUCE', 'SERDE','SERDEPROPERTIES','RECORDREADER','RECORDWRITER','DELIMITED','FIELDS','TERMINATED','COLLECTION','ITEMS','KEY','ESCAPED','LINES','SEPARATED ','FUNCTION','EXTENDED','REFRESH','CLEAR','CACHE','UNCACHE','LAZY','FORMATTED','GLOBAL',TEMPORARY,'OPTIONS','UNSET','TBLPROPERTIES ”,“ DBPROPERTIES”,“ BUCKETS”,“ SKEWED”,“ STORED”,“ DIRECTORIES”,“ LOCATION”,“ EXCHANGE”,“ ARCHIVE”,“ UNARCHIVE”,“ FILEFORMAT”,“ TOUCH”,“ COMPACT”, 'CONCATENATE','CHANGE','CASCADE','RESTRICT','Clustered','SORTED','PURGE','INPUTFORMAT','OUTPUTFORMAT',DATABASE,DATABASE,'DFS','TRUNCATE','ANALYZE ”,“计算机”,“列表”,“统计信息”,“已分配”,“外部”,“已定义”,“撤销”,“ GRANT”,“ LOCK”,“ UNLOCK”,“ MSCK”,“ REPAIR”, 'RECOVER','EXPORT','IMPORT ','LOAD','ROLE','ROLES','COMPACTIONS','PRINCIPALS','TRANSACTIONS','INDEX','INDEXES','LOCKS','OPTION','ANTI','LOCAL', 'INPATH',STRING,BIGINT_LITERAL,SMALLINT_LITERAL,TINYINT_LITERAL,INTEGER_VALUE,DECIMAL_VALUE,DOUBLE_LITERAL,BIGDECIMAL_LITERAL,IDENTIFIER,BACKQUOTED_IDENTIFIER}(第1行,pos 53)

     == SQL ==

如果我仅对一列执行上述操作,即会得到结果

   val sqlResultjoin_groupInfo= spark.sql("select sum(values) as volume,values from ( select * from dist a join map b on  a.id=b.id ) abc group by values")

有人可以帮助我,如何在Spark sql上进行分组?

1 个答案:

答案 0 :(得分:1)

通过alias.col引用所有外部查询项。例如

  abc.values

您正在使用嵌入式视图。即abc。