如何使用包含“$”的列名进行查询?

时间:2017-03-06 01:59:33

标签: json scala apache-spark apache-spark-sql

在Spark SQL中,我可以使用

val spark = SparkSession
      .builder()
      .appName("SparkSessionZipsExample")
      .master("local")
      .config("spark.sql.warehouse.dir", "warehouseLocation-value")
      .getOrCreate()

val df = spark.read.json("source/myRecords.json")
df.createOrReplaceTempView("shipment")
val sqlDF = spark.sql("SELECT * FROM shipment")

从“myRecords.json”获取数据,这个json文件的结构是:

df.printSchema()
root
 |-- _id: struct (nullable = true)
 |    |-- $oid: string (nullable = true)
 |-- container: struct (nullable = true)
 |    |-- barcode: string (nullable = true)
 |    |-- code: string (nullable = true)

我可以获得这个json的特定列,例如:

val sqlDF = spark.sql("SELECT container.barcode, container.code FROM shipment")

但是如何从这个json文件中获取id。$ oid? 我尝试了"SELECT id.$oid FROM shipment_log""SELECT id.\$oid FROM shipment_log",但根本没有工作。 错误讯息:

 error: invalid escape character

任何人都可以告诉我如何获得id.$oid

1 个答案:

答案 0 :(得分:6)

Backticks是你的朋友:

spark.read.json(sc.parallelize(Seq(
  """{"_id": {"$oid": "foo"}}""")
)).createOrReplaceTempView("df")

spark.sql("SELECT _id.`$oid` FROM df").show
+----+
|$oid|
+----+
| foo|
+----+

DataFrame API相同:

spark.table("df").select($"_id".getItem("$oid")).show
+--------+
|_id.$oid|
+--------+
|     foo|
+--------+

spark.table("df").select($"_id.$$oid")
+--------+
|_id.$oid|
+--------+
|     foo|
+--------+