spark百分比问题的SQL查询

时间:2015-12-23 18:47:00

标签: scala apache-spark apache-spark-sql

我无法使用%符号获取spark sql查询。假设我有以下数据框

val df = sc.parallelize(Seq(("Peter",123,23.5),("John",45,45.5)))
           .toDF("Name","Age","score(%)")
df.show

所以表格如下:

+-----+---+--------+
| Name|Age|score(%)|
+-----+---+--------+
|Peter|123|    23.5|
| John| 45|    45.5|
+-----+---+--------+

我能做到:

sqlContext.sql("SELECT Name FROM df")

显示:

+-----+
| Name|
+-----+
|Peter|
| John|
+-----+

但是当我这样做时:

sqlContext.sql("SELECT score(%) FROM df")

它抛出以下内容:(看起来%导致问题,我尝试使用\%,但它没有帮助)

java.lang.RuntimeException: [1.14] failure: ``distinct'' expected but `%' found

SELECT score(%) FROM df
             ^
  at scala.sys.package$.error(package.scala:27)
  at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:36)
  at org.apache.spark.sql.catalyst.DefaultParserDialect.parse(ParserDialect.scala:67)
  at org.apache.spark.sql.SQLContext$$anonfun$3.apply(SQLContext.scala:175)
  at org.apache.spark.sql.SQLContext$$anonfun$3.apply(SQLContext.scala:175)
  at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:115)
  at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:114)
  at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:137)
  at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
  at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237)
  at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237)
  at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217)
  at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249)
  at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249)
  at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:197)
  at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249)
  at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249)
  at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217)
  at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882)
  at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882)
  at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
  at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:881)
  at scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
  at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:34)
  at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:172)
  at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:172)
  at org.apache.spark.sql.execution.datasources.DDLParser.parse(DDLParser.scala:42)
  at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:195)
  at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:725)
  ... 48 elided

(当我使用spark-csv编程火花来摄取大量csv时会遇到这个问题。当我尝试执行sql SELECT时,我遇到了这个%问题。我会如果可能的话,尽量避免修改标题......)

1 个答案:

答案 0 :(得分:3)

尝试使用反引号分隔列名。

sqlContext.sql("SELECT `score(%)` FROM df")