Question

我已经设置了我的spark集群，并且我通过Spark SQL连接器成功连接了Tableau。

我使用spark shell创建了我的表，并使用（df = df.set_index('User') #replace all non 'yes' and 'no' values to `NaN` and reshape df = df.where(df.isin(['yes','no'])).stack().reset_index(name='val') #get for each User unique columns names df = df.groupby('User')['level_1'].unique().reset_index(name='un_val') print (df) User un_val 0 Dan [cb43/cb432c, cb43/cb433c, cb43/cb434c, cb43/c... 1 Jan [cb43/cb432c, cb43/cb433c, cb43/cb434c, cb43/c...）从MySQL保存了数据帧。

如何访问从Tableau保存的表格？启动spark thrift服务器时，是否需要提供仓库目录的路径？如果是的话，它是如何完成的，如果不是，怎么办呢？

Answer 1

确保您指向 spark-shell 的相同的Metastore    thriftserver

Metastore共享可以有两种方式，简单


从同一位置启动shell和thrift

为Metastore设置远程数据库

您可以将hive confs传递给带有--hiveconf的Spark thrift服务器，并将Spark配置为--conf

./sbin/start-thriftserver.sh \ --conf spark.sql.warehouse.dir=path/to/warehouse/dir \ --hiveconf hive.server2.thrift.port=<listening-port> \ --hiveconf hive.server2.thrift.bind.host=<listening-host> \ --master <master-uri> ...

给spark thrift服务器提供仓库目录的路径

1 个答案: