Spark命令:
spark-submit \
--class com.dev.SparkHiveToHdfs \
--jars /home/dev/dbJars/datanucleus-api-jdo-3.2.6.jar,/home/dev/dbJars/datanucleus-rdbms-3.2.9.jar,/home/dev/dbJars/datanucleus-core-3.2.10.jar \
--master yarn-cluster \
--name DCA_SPARK_JOB \
/home/dev/dbJars/data-connector-spark.jar dev.emp
data-connector-spark.jar
包含以下代码:
public class SparkHiveToHdfs {
public static void main(String[] args) throws Exception {
String hiveTableNameWithSchema = args[0];
SparkConf conf = new SparkConf(true).setMaster("yarn-cluster").setAppName("DCA_HIVE_HDFS");
SparkContext sc = new SparkContext(conf);
HiveContext hc = new HiveContext(sc);
DataFrame df = hc.sql("select * from "+hiveTableNameWithSchema);
df.printSchema();
}
}
hive-site.xml
中$SPARK_HOME/conf
的属性:
<property>
<name>hive.metastore.client.connect.retry.delay</name>
<value>5</value>
</property>
<property>
<name>hive.metastore.client.socket.timeout</name>
<value>1800</value>
</property>
<property>
<name>hive.metastore.connect.retries</name>
<value>24</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://xxxx:9083</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.transport.mode</name>
<value>binary</value>
</property>
错误日志:
ERROR ApplicationMaster: User class threw exception: org.apache.spark.sql.AnalysisException: Table not found: `dev`.`emp`; line 1 pos 18
org.apache.spark.sql.AnalysisException: Table not found: `dev`.`emp`; line 1 pos 18
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:54)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:121)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:120)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:44)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
at com.impetus.idw.data.connector.SparkHiveToHdfs.main(SparkHiveToHdfs.java:30)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)
答案 0 :(得分:2)
请尝试在spark submit命令中传递hive-site.xml。
spark-submit \
--class com.dev.SparkHiveToHdfs \
--jars /home/dev/dbJars/datanucleus-api-jdo-3.2.6.jar,/home/dev/dbJars/datanucleus-rdbms-3.2.9.jar,/home/dev/dbJars/datanucleus-core-3.2.10.jar \
--master yarn-cluster \
--name DCA_SPARK_JOB \
--files hive-site.xml
/home/dev/dbJars/data-connector-spark.jar dev.emp