我尝试使用pyspark和jdbc将数据插入到mariadb中,但似乎pyspark没有生成正确的SQL,我的Spark版本是2.1.0,我没有遇到这个问题。集群管理器将Spark从1.6.1更新到2.1.0,这是My python代码
from pyspark.sql import Row, SparkSession as SS
if __name__ == "__main__":
spark = SS.builder.appName("boot_count").getOrCreate()
sc = spark.SparkContext
l = [(str(20160101), str(1)]
rdd = sc.parallelize(l)
rdd = rdd.map(lambda x: Row(day=x[0], count=x[1]))
df = spark.createDataFrame(rdd)
df.createOrReplaceTempView("boot_count")
mysql_url = "jdbc:mariadb://master.cluster:3306/dbname"
properties = {'user': 'root', 'driver': 'org.mariadb.jdbc.Driver'}
df.write.jdbc(url=mysql_url, table="boot_count", mode="append",
properties=properties)
这是我的错误信息
Caused by: java.sql.SQLSyntaxErrorException: (conn:364) You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '"count","day") VALUES ('1','20160101')' at line 1 Query is : INSERT INTO boot_count ("count","day") VALUES ('1','20160101')
我在MariaDB中使用命令来解决这个问题。
>set global sql_mode=ANSI_QUOTES
答案 0 :(得分:0)
要么在列名称周围添加反对意见,要么使用允许在列名称周围使用双引号的设置。