看起来这会出错
df.write()
.option("mode", "DROPMALFORMED")
.option("compression", "snappy")
.mode("overwrite")
.bucketBy(32,"column")
.sortBy("column")
.parquet("s3://....");
有错误
Exception in thread "main" org.apache.spark.sql.AnalysisException: 'save' does not support bucketing right now; at org.apache.spark.sql.DataFrameWriter.assertNotBucketed(DataFrameWriter.scala:314)
我看到仍然支持saveAsTable("myfile")
,但它只在本地写入。在工作完成后,我如何获取saveAsTable(...)
输出并将其放在s3上?
答案 0 :(得分:8)
You Can use like below:
df
.write()
.option("mode", "DROPMALFORMED")
.option("compression", "snappy")
.option("path","s3://....")
.mode("overwrite")
.format("parquet")
.bucketBy(32,"column").sortBy("column")
.saveAsTable("tableName");
这将创建一个指向S3位置的外部表 .option(“path”,“s3:// ....”)就是这里的抓点