我是大数据的新手。
使用brew
安装
在我将mysql设置为比我设置的metastore后
<name>hive.execution.engine</name>
<value>spark</value>
hive-site.xml 中的。
使用 Hive on Mr 时,一切都很完美。
在 hive模式下中,我可以创建db,tables并选择,但是当我插入时出现错误。
➜ conf hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/Cellar/hive/3.1.1/libexec/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/Cellar/hadoop/3.1.1/libexec/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = aa5104a5-cc5c-4081-8cb4-198d17b22e55
Logging initialized using configuration in jar:file:/usr/local/Cellar/hive/3.1.1/libexec/lib/hive-common-3.1.1.jar!/hive-log4j2.properties Async: true
Hive Session ID = 1f84a3d7-16f6-48dd-bbe7-bcd78982fa72
hive> use sparktest;
OK
Time taken: 1.054 seconds
hive> select * from student;
OK
1 Xueqian F 23
2 Weiliang M 24
Time taken: 1.577 seconds, Fetched: 2 row(s)
hive> insert into student values(2,'Weiliang','M',25);
Query ID = wyx_20190224200349_87e1103b-dbfe-4761-aa6e-40bed1811399
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session c3fb44fc-eada-4878-a86e-9b339787e207)'
FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session c3fb44fc-eada-4878-a86e-9b339787e207
hive>
在 hive-site.xml 中,我发现hive.exec.reducers.bytes.per.reducer
等已设置,为什么还要重新设置?
230 <property>
231 <name>hive.exec.reducers.bytes.per.reducer</name>
232 <value>256000000</value>
233 <description>size per reducer.The default is 256Mb, i.e if the input size is 1G, it will use 4 reducers.</description>
234 </property>
235 <property>
236 <name>hive.exec.reducers.max</name>
237 <value>1009</value>
238 <description>
239 max number of reducers will be used. If the one specified in the configuration parameter mapred.reduce.tasks is
240 negative, Hive will use this one as the max number of reducers when automatically determine number of reducers.
241 </description>
242 </property>
我认为我的火花支持蜂巢。
➜ ~ spark-shell
2019-02-24 20:29:53 WARN Utils:66 - Your hostname, wuyuxideMacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 192.168.1.100 instead (on interface en0)
2019-02-24 20:29:53 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2019-02-24 20:29:54 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://192.168.1.100:4040
Spark context available as 'sc' (master = local[*], app id = local-1551011399409).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.2
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)
Type in expressions to have them evaluated.
Type :help for more information.
scala> import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveContext
scala>
我使用pyspark==2.4.0
来访问Spark上的蜂巢。
from pyspark import SparkContext, HiveContext
with SparkContext() as sc:
hive_context = HiveContext(sc)
hive_context.sql('SELECT * FROM sparktest.student').show()
找不到错误sparktest.student
2019-02-24 20:59:31 WARN Utils:66 - Your hostname, wuyuxideMacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 192.168.1.100 instead (on interface en0)
2019-02-24 20:59:31 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2019-02-24 20:59:31 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2019-02-24 20:59:36 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
2019-02-24 20:59:36 WARN ObjectStore:568 - Failed to get database sparktest, returning NoSuchObjectException
Traceback (most recent call last):
File "/Users/wyx/project/py3.7aio/.env/lib/python3.6/site-packages/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/Users/wyx/project/py3.7aio/.env/lib/python3.6/site-packages/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o23.sql.
: org.apache.spark.sql.AnalysisException: Table or view not found: `sparktest`.`student`; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation `sparktest`.`student`
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:90)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:85)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:85)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:95)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:108)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:105)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:745)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/wyx/project/py3.7aio/spark/hive.py", line 13, in <module>
hive_context.sql('SELECT * FROM sparktest.student').show()
File "/Users/wyx/project/py3.7aio/.env/lib/python3.6/site-packages/pyspark/sql/context.py", line 358, in sql
return self.sparkSession.sql(sqlQuery)
File "/Users/wyx/project/py3.7aio/.env/lib/python3.6/site-packages/pyspark/sql/session.py", line 767, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/Users/wyx/project/py3.7aio/.env/lib/python3.6/site-packages/py4j/java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/Users/wyx/project/py3.7aio/.env/lib/python3.6/site-packages/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: "Table or view not found: `sparktest`.`student`; line 1 pos 14;\n'Project [*]\n+- 'UnresolvedRelation `sparktest`.`student`\n"
Process finished with exit code 1