我一直在尝试在Windows上的intellij中安装并运行一个简单的Java Apache Spark,但遇到无法解决的错误。我已经通过maven安装了spark。我收到此错误:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/03/20 23:53:23 INFO SparkContext: Running Spark version 2.0.0-cloudera1-SNAPSHOT
19/03/20 23:53:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/03/20 23:53:24 INFO SecurityManager: Changing view acls to: Drakker
19/03/20 23:53:24 INFO SecurityManager: Changing modify acls to: Drakker
19/03/20 23:53:24 INFO SecurityManager: Changing view acls groups to:
19/03/20 23:53:24 INFO SecurityManager: Changing modify acls groups to:
19/03/20 23:53:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Drakker); groups with view permissions: Set(); users with modify permissions: Set(Drakker); groups with modify permissions: Set()
19/03/20 23:53:25 INFO Utils: Successfully started service 'sparkDriver' on port 50007.
19/03/20 23:53:25 INFO SparkEnv: Registering MapOutputTracker
19/03/20 23:53:25 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: System memory 259522560 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration.
at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:212)
at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:194)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:308)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:165)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:260)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:429)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at Spark.App.main(App.java:16)
19/03/20 23:53:25 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" java.lang.IllegalArgumentException: System memory 259522560 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration.
at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:212)
at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:194)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:308)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:165)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:260)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:429)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at Spark.App.main(App.java:16)
我尝试手动设置驱动程序内存,但是没有用。我也尝试在本地安装spark,但是从命令提示符处更改驱动程序内存无济于事。
这是代码:
package Spark;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import java.util.Arrays;
import java.util.List;
public class App
{
public static void main( String[] args )
{
SparkConf conf = new SparkConf().setAppName("Spark").setMaster("local");
// conf.set("spark.driver.memory","471859200");
JavaSparkContext sc = new JavaSparkContext(conf);
List<Integer> data= Arrays.asList(1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9);
JavaRDD<Integer> rdd=sc.parallelize(data);
JavaRDD<Integer> list=rdd.map(s->s);
int totalLines=list.reduce((a,b)->a+b);
System.out.println(totalLines);
}
}
实例化JavaSparkContext时出现错误。有谁知道如何解决这个问题?
谢谢!
答案 0 :(得分:1)
我对您的代码有些困惑,因为它混合了SparkConf
之类的Spark 2.x之前的结构和许多RDD。使用它们并没有错,但是自从Spark 2.x以来,情况有所不同。
这里是一个使用SparkSession
和数据帧的示例,它是RDD的超集,更强大的版本(简而言之)。
在该示例中,您将看到几种进行map / reduce操作的方法,两种使用map / reduce,一种使用类似SQL的简单语法。
int totalLines = df
.map(
(MapFunction<Row, Integer>) row -> row.<Integer>getAs("i"),
Encoders.INT())
.reduce((a, b) -> a + b);
System.out.println(totalLines);
totalLines = df
.map(
(MapFunction<Row, Integer>) row -> row.getInt(0),
Encoders.INT())
.reduce((a, b) -> a + b);
System.out.println(totalLines);
这可能是最受欢迎的。
long totalLinesL = df.selectExpr("sum(*)").first().getLong(0);
System.out.println(totalLinesL);
package net.jgp.books.spark.ch07.lab990_others;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import org.apache.spark.api.java.function.MapFunction;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;
/**
* Simple ingestion followed by map and reduce operations.
*
* @author jgp
*/
public class SelfIngestionApp {
/**
* main() is your entry point to the application.
*
* @param args
*/
public static void main(String[] args) {
SelfIngestionApp app = new SelfIngestionApp();
app.start();
}
/**
* The processing code.
*/
private void start() {
// Creates a session on a local master
SparkSession spark = SparkSession.builder()
.appName("Self ingestion")
.master("local[*]")
.getOrCreate();
Dataset<Row> df = createDataframe(spark);
df.show(false);
// map and reduce with getAs()
int totalLines = df
.map(
(MapFunction<Row, Integer>) row -> row.<Integer>getAs("i"),
Encoders.INT())
.reduce((a, b) -> a + b);
System.out.println(totalLines);
// map and reduce with getInt()
totalLines = df
.map(
(MapFunction<Row, Integer>) row -> row.getInt(0),
Encoders.INT())
.reduce((a, b) -> a + b);
System.out.println(totalLines);
// SQL-like
long totalLinesL = df.selectExpr("sum(*)").first().getLong(0);
System.out.println(totalLinesL);
}
private static Dataset<Row> createDataframe(SparkSession spark) {
StructType schema = DataTypes.createStructType(new StructField[] {
DataTypes.createStructField(
"i",
DataTypes.IntegerType,
false) });
List<Integer> data =
Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9);
List<Row> rows = new ArrayList<>();
for (int i : data) {
rows.add(RowFactory.create(i));
}
return spark.createDataFrame(rows, schema);
}
}
答案 1 :(得分:1)
Driver Memory Exception
当Spark驱动程序内存不足时会发生这种情况。那就是当启动驱动程序的应用程序主机超出限制并终止纱线过程时。
错误消息:Java.lang.OutOfMemoryError
解决方案:通过设置以下内容来增加驱动程序的内存:
conf spark.driver.memory = <XY>g
答案 2 :(得分:0)
您可以尝试使用Spark Session构建器,并且可以通过spark.sparkContext()获得Spark上下文
public static SparkSession sparkSession(String master,
String appName) {
return SparkSession.builder().appName(appName)
.master(master)
.config("spark.dynamicAllocation.enabled", true)
.config("spark.shuffle.service.enabled", true)
.config("spark.driver.maxResultSize", "8g")
.config("spark.executor.memory", "8g")
.config("spark.executor.cores", "4")
.config("spark.cores.max", "6")
.config("spark.submit.deployMode", "client")
.config("spark.network.timeout", "3600s")
.config("spark.eventLog.enabled", true)
.getOrCreate();
}
答案 3 :(得分:0)
如果使用日食,则可以设置Run
> Run Configurations...
> Arguments
> VM arguments and set max heap size like -Xmx512m.
您可以设置Run\Debug Configurations
> VM options : -Xmx512m
在您的代码中,您可以尝试使用此conf.set("spark.testing.memory", "2147480000")