我买了一本书-尝试学习Spark。 下载它并遵循正确的步骤后,我在加载spark-shell和pyspark时遇到问题。想知道是否有人可以指出我要运行spark-shell或pyspark需要做什么
这就是我所做的。
我创建了文件夹C:\ spark,并将Spark tar中的所有文件放入了该文件夹中。
我还创建了c:\ hadoop \ bin并将winutils.exe放入文件夹。
执行以下操作:
> set SPARK_HOME=c:\spark
> set HADOOP_HOME=c:\hadoop
> set PATH=%SPARK_HOME%\bin;%PATH%
> set PATH=%HADOOP_HOME%\bin;%PATH%
> set PYTHONPATH=C:\Users\AppData\Local\Continuum\anaconda3
创建C:\ tmp \ hive并执行以下操作:
> cd c:\hadoop\bin
> winutils.exe chmod -R 777 C:\tmp\hive
还执行以下操作:
> set PYSPARK_PYTHON=C:\Users\AppData\Local\Continuum\anaconda3\python
> set PYSPARK_DRIVER_PYTHON=C:\Users\AppData\Local\Continuum\anaconda3\ipython
还有QQ,我尝试通过执行以下操作来检查并确认设置了环境变量SPARK_HOME的内容 (我认为这是我的方法。这是查看我是否正确设置环境变量的正确方法吗?)
>echo %SPARK_HOME%
我刚回来 %SPARK_HOME%
我也做了:
>echo %PATH%
在CMD上打印的目录中没有看到%SPARK_HOME%\ bin或%HADOOP_HOME%\ bin。
当我最终尝试运行pyspark时:
C:\spark\bin>pyspark
我收到以下错误消息:
Missing Python executable 'C:\Users\AppData\Local\Continuum\anaconda3\pyth
on', defaulting to 'C:\spark\bin\..' for SPARK_HOME environment variable. Please
install Python or specify the correct Python executable in PYSPARK_DRIVER_PYTHON or
PYSPARK_PYTHON environment variable to detect SPARK_HOME safely.
'C:\Users\AppData\Local\Continuum\anaconda3\ipython' is not recognized as
an internal or external command,
operable program or batch file.
当我尝试运行spark-shell时:
C:\spark\bin>spark-shell
我收到以下错误消息:
Missing Python executable 'C:\Users\AppData\Local\Continuum\anaconda3\pyth
on', defaulting to 'C:\spark\bin\..' for SPARK_HOME environment variable.
Please install Python or specify the correct Python executable in PYSPARK_DRIVER_PYTHO
N or PYSPARK_PYTHON environment variable to detect SPARK_HOME safely.
'C:\Users\AppData\Local\Continuum\anaconda3\ipython' is not recognized as
an internal or external command,
operable program or batch file.
C:\spark\bin>spark-shell
Missing Python executable 'C:\Users\AppData\Local\Continuum\anaconda3\pyth
on', defaulting to 'C:\spark\bin\..' for SPARK_HOME environment variable.
Please
install Python or specify the correct Python executable in PYSPARK_DRIVER_PYTHO
N or PYSPARK_PYTHON environment variable to detect SPARK_HOME safely.
2018-08-19 18:29:01 ERROR Shell:397 - Failed to locate the winutils binary in th
e hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in
the Ha
doop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(Secur
ityUtil.java:611)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupI
nformation.java:273)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(Use
rGroupInformation.java:261)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(
UserGroupInformation.java:791)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGrou
pInformation.java:761)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGr
oupInformation.java:634)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils
.scala:2467)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils
.scala:2467)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2467)
at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:220)
at org.apache.spark.deploy.SparkSubmit$.secMgr$lzycompute$1(SparkSubmit.
scala:408)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSub
mit$$secMgr$1(SparkSubmit.scala:408)
at org.apache.spark.deploy.SparkSubmit$$anonfun$doPrepareSubmitEnvironme
nt$7.apply(SparkSubmit.scala:416)
at org.apache.spark.deploy.SparkSubmit$$anonfun$doPrepareSubmitEnvironme
nt$7.apply(SparkSubmit.scala:416)
at scala.Option.map(Option.scala:146)
at org.apache.spark.deploy.SparkSubmit$.doPrepareSubmitEnvironment(Spark
Submit.scala:415)
at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSu
bmit.scala:250)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:171)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2018-08-19 18:29:01 WARN NativeCodeLoader:62 - Unable to load native-hadoop
lib
rary for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLeve
l(newLevel).
2018-08-19 18:29:08 WARN Utils:66 - Service 'SparkUI' could not bind on port 40
40. Attempting port 4041.
Spark context Web UI available at http://NJ1-BCTR-10504.usa.fxcorp.prv:4041
Spark context available as 'sc' (master = local[*], app id = local-1534717748215
).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.1
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
答案 0 :(得分:0)
我发现您的设置中缺少以下内容
1。
Apache Spark需要Java 1.6或更高版本,请确保安装jdk(最新版本)并为Java设置环境变量路径。
C:\ Program Files \ Java \ jdk1.8.0_172 \ bin
尝试在cmd提示符下运行以下提到的简单Java命令,以验证计算机上是否已正确安装Java:
java --version
在成功安装Java后,将您的spark环境变量设置为
C:\ Spark
由于您正在本地系统上运行spark,因此无需设置“ Hadoop_home”,因为spark可以运行独立的资源导航器
2。
要使pyspark正常工作,您可能必须安装pyspark python软件包
pip安装pyspark
日志设置:很高兴
我看到您的日志太冗长,可以通过spark / conf文件夹下的“ log4j.properties”文件进行控制,以不显示信息。
答案 1 :(得分:0)
我将下载具有所有已配置https://mapr.com/products/mapr-sandbox-hadoop/的VM(已完全配置的节点)
您将可以将Spark与hdfs,hive和任何其他工具一起使用。