'带有Spark的多行字符串中的EOF

时间:2018-10-12 19:09:49

标签: azure apache-spark pyspark-sql

以下代码产生以下错误:

  

错误:标记输入时发生意外错误   以下回溯可能已损坏或无效   错误消息是:('EOF in multi-line string',(1,23))

这在Azure Databricks中的Spark群集上运行。这很奇怪,因为它可以在我的Linux Spark集群上正常工作。

import sys

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('ops').getOrCreate()

spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential")
spark.conf.set("dfs.adls.oauth2.client.id", "dc4f6b60-f13f-4af2-936d-04e16cd12642")
spark.conf.set("dfs.adls.oauth2.credential", "JTzYVPJnwwaHd5axSBFIf0LDgr5ed9E5CfEsCWvgj7I=")
spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.microsoftonline.com/1eea4c39-2312-4936-80b7-99b419d587e5/oauth2/token")

df1 = spark.read.json("adl://carlslake.azuredatalakestore.net/jfolder2/Data_Country.json")
df2 = spark.read.json("adl://carlslake.azuredatalakestore.net/jfolder2/Data_Customer.json")
df3 = spark.read.json("adl://carlslake.azuredatalakestore.net/jfolder2/Data_Sales.json")
df4 = spark.read.json("adl://carlslake.azuredatalakestore.net/jfolder2/Data_SalesDetails.json")
df5 = spark.read.json("adl://carlslake.azuredatalakestore.net/jfolder2/Data_Stock.json")

df1.createOrReplaceTempView('Data_Country')
df2.createOrReplaceTempView('Data_Customer')
df3.createOrReplaceTempView('Data_Sales')
df4.createOrReplaceTempView('Data_SalesDetails')
df5.createOrReplaceTempView('Data_Stock')

#This is taken from 'By MySelf Adding .. in dbforge this shows how to do joins
example1 = spark.sql("""SELECT
  CF.CountryName AS CountrySold
 ,COUNT(CF.CountryName) AS soldincountry
 ,MAX(CB.SalesDetailsID) AS CarsSold
FROM Data_Stock CS
INNER JOIN Data_SalesDetails CB
  ON CS.StockCode = CB.StockID
INNER JOIN Data_Sales CD
  ON CB.SalesID = CD.SalesID
INNER JOIN Data_Customer CG
  ON CD.CustomerID = CG.CustomerID
INNER JOIN Data_Country CF
  ON CG.Country = CF.CountryISO2
GROUP BY CF.CountryName""")

有什么想法我可能会出错吗?

0 个答案:

没有答案