我想做什么: 保存Mllib模型
我在Spark中运行的代码:
model = RandomForest.trainClassifier(train_data,
numClasses=2, categoricalFeaturesInfo=categoricalFeaturesInfo,
numTrees=numTrees, featureSubsetStrategy="auto",
impurity=impurity, maxDepth=maxDepth, maxBins=maxBins)
model.save(sc, "file:///path/to/models/model_name")
错误消息为:
本机snappy库不可用:此版本的libhadoop是在没有snappy支持的情况下构建的
Spark版本: 1.6.1
启动Spark的代码:
pyspark --packages com.databricks:spark-csv_2.11:1.5.0 --master "local[8]" --driver-memory 6G --executor-memory 6G --jars /usr/local/path/to/hadoop/lib/snappy-java-1.0.4.1.jar
环境变量(spark_env.sh):
HADOOP_HOME=/usr/local/path/to/hadoop
SPARK_HOME=/usr/local/path/to/spark
HADOOP_CONF_DIR=/usr/local/path/to/hadoop/etc/hadoop
SPARK_CONF_DIR=/usr/local/path/to/spark/conf
HADOOP_LZO_DIR=/usr/local/path/to/hadoop/lib
HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/path/to/spark/*:/usr/local/path/to/spark/lib/*:/usr/local/path/to/hadoop/lib/*:/usr/local/path/to/hadoop/lib/native/*
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/path/to/hadoop/lib/native
SPARK_CLASSPATH=$SPARK_CLASSPATH:/usr/local/path/to/hadoop/lib/native/*:/usr/local/path/to/hadoop/lib/snappy-java-1.0.4.1.jar:/usr.local/path/to/spark/lib/*:$CLASSPATH
我有意在SPARK_CLASSPATH中添加/usr/local/path/to/hadoop/lib/snappy-java-1.0.4.1.jar但无济于事。
答案 0 :(得分:1)
接下来的步骤帮助我解决了类似的问题:
从https://hadoop.apache.org/releases.html下载hadoop库,
在本地解压缩并将spark.driver.extraLibraryPath
设置为$HADOOP_PATH/lib/native
。
示例:
pyspark --packages com.databricks:spark-csv_2.11:1.5.0 --master "local[8]" --conf "spark.driver.extraLibraryPath=/home/hadoop/hadoop-2.8.1/lib/native" --driver-memory 6G --executor-memory 6G --jars /usr/local/path/to/hadoop/lib/snappy-java-1.0.4.1.jar
如果spark.executor.extraLibraryPath
不适合您,可以尝试将id="myModal"
设置为相同的值。