Spark SQL:ORDER BY计数DESC失败了吗?

时间:2015-06-04 11:40:57

标签: sql apache-spark

这些图书有两列booksreaders,其中booksreaders分别是图书和读者ID。 当他们试图通过他们阅读的书籍订购读者时,我得到AbstractSparkSQLParser例外:

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.log4j.Logger
import org.apache.log4j.Level
import org.apache.spark.sql.functions._

object Small {

  case class Book(book: Int, reader: Int)

  val recs = Array(
    Book(book = 1, reader = 30),
    Book(book = 2, reader = 10),
    Book(book = 3, reader = 20),
    Book(book = 1, reader = 20),
    Book(book = 1, reader = 10),
    Book(book = 1, reader = 40),
    Book(book = 2, reader = 40),
    Book(book = 2, reader = 30))

  def main(args: Array[String]) {
    Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
    Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)
    // set up environment
    val conf = new SparkConf()
      .setMaster("local[5]")
      .setAppName("Small")
      .set("spark.executor.memory", "2g")
    val sc = new SparkContext(conf)

    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    import sqlContext.implicits._
    val df = sc.parallelize(recs).toDF()

    val readerGroups = df.groupBy("reader").count()
    readerGroups.show()

    readerGroups.registerTempTable("readerGroups")
    readerGroups.printSchema()

    // "SELECT reader, count FROM readerGroups ORDER BY count DESC"
    val readerGroupsSorted = sqlContext.sql("SELECT * FROM readerGroups ORDER BY count DESC")
    readerGroupsSorted.show()
    println("Group Cnt: "+readerGroupsSorted.count())

这是一个输出,'groupBy`可以正常工作:

    reader count
    40     2    
    10     2    
    20     2    
    30     2    

结果架构:

    root
     |-- reader: integer (nullable = false)
     |-- count: long (nullable = false)

然而SELECT * FROM readerGroups ORDER BY count DESC失败但有异常(见下文)。事实上,除了 selectSELECT * FROM readerGroups之外,所有其他SELECT reader FROM readerGroups rtequest也会失败 - 这些都有效。这是为什么?

如何让ORDER BY count DESC工作?

    Exception in thread "main" java.lang.RuntimeException: [1.43] failure: ``('' expected but `desc' found

    SELECT * FROM readerGroups ORDER BY count DESC
                                              ^
        at scala.sys.package$.error(package.scala:27)
        at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(AbstractSparkSQLParser.scala:40)
        at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:134)
        at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:134)
        at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:96)
        at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:95)
        at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
        at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
        at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
        at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
        at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
        at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
        at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
        at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890)
        at scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
        at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(AbstractSparkSQLParser.scala:38)
        at org.apache.spark.sql.SQLContext$$anonfun$parseSql$1.apply(SQLContext.scala:138)
        at org.apache.spark.sql.SQLContext$$anonfun$parseSql$1.apply(SQLContext.scala:138)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:138)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:933)
        at Small$.main(Small.scala:60)
        at Small.main(Small.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

2 个答案:

答案 0 :(得分:3)

问题是colum COUNT的名称。 COUNT是spark中的保留字,因此您无法使用其名称进行查询,也无法使用此字段进行排序。

您可以尝试使用反引号:

select * from readerGroups ORDER BY `count` DESC

另一个选项是将列数重命名为不同的NumReaders或其他......

答案 1 :(得分:0)

使用派生表按计算字段排序(例如top,max,count ...)

SELECT * FROM
(
SELECT reader, count(book) AS book_count
FROM readerbook
GROUP by reader) a
ORDER BY book_count desc

实际上第二个想法是,如果您使用这样的别名,可能只需要执行您的订单:

SELECT reader, count(book) AS book_count
FROM readerbook
GROUP by reader
ORDER BY book_count desc