我正在尝试使用以下代码使用Zeppelin:
val dataText = sc.parallelize(IOUtils.toString(new URL("http://XXX.XX.XXX.121:8090/my_data.txt"),Charset.forName("utf8")).split("\n"))
case class Data(id: string, time: long, value1: Double, value2: int, mode: int)
val dat = dataText .map(s => s.split("\t")).filter(s => s(0) != "Header:").map(
s => Data(s(0),
s(1).toLong,
s(2).toDouble,
s(3).toInt,
s(4).toInt
)
).toDF()
dat.registerTempTable("mydatatable")
这一直让我误以为错误:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
at java.lang.StringBuilder.append(StringBuilder.java:204)
at org.apache.commons.io.output.StringBuilderWriter.write(StringBuilderWriter.java:138)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2002)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1980)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:1957)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:1907)
at org.apache.commons.io.IOUtils.toString(IOUtils.java:778)
at org.apache.commons.io.IOUtils.toString(IOUtils.java:896)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:43)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:47)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:49)
at $iwC$$iwC$$iwC.<init>(<console>:51)
at $iwC$$iwC.<init>(<console>:53)
at $iwC.<init>(<console>:55)
at <init>(<console>:57)
at .<init>(<console>:61)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
我已在zeppelin-env.sh
export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.0.0-2557 -Dspark.executor.memory=4g"
任何想法,我可能会失踪。我正在解析my_data.txt
的文件大约是200MB
顺便说一下,如果重要的话,我正在使用Hortonworks Sandbox
编辑1
这是我的zeppelin-env.sh
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_PORT=9995
export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.0.0-2557 -Dspark.executor.memory=4g"
export SPARK_SUBMIT_OPTIONS="--driver-java-options -Xmx4g"
export ZEPPELIN_INT_MEM="-Xmx4g"
export SPARK_HOME=/usr/hdp/2.3.0.0-2557/spark
此致 基兰
答案 0 :(得分:3)
您可以尝试在conf/zeppelin-env.sh
中的SPARK_SUBMIT_OPTIONS中增加内存:
export SPARK_SUBMIT_OPTIONS="--driver-java-options -Xmx20g"
答案 1 :(得分:0)
为以下 zeppelin-env.sh
变量增加内存,对我有用。默认是 1/0.5GB,我增加到 10/5GB
ZEPPELIN_MEM": "-Xmx10024m -XX:MaxPermSize=5120m
答案 2 :(得分:0)
我在尝试启动 Zeppelin 笔记本时遇到以下错误
INFO [2021-05-04 15:16:22,015] ({main} Folder.java[addNote]:185) - Add note 2G7CAFXX7 to folder /
INFO [2021-05-04 15:16:22,016] ({main} Notebook.java[<init>]:127) - Notebook indexing started...
WARN [2021-05-04 15:16:32,045] ({main} ContextHandler.java[log]:2355) - unavailable
MultiException stack 1 of 1
java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.store.RAMFile.newBuffer(RAMFile.java:80)
at org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:53)
为了解决这个问题,我像这样调整了 zeppelin-env.sh 文件中的 ZEPPELIN_MEM 参数,
export ZEPPELIN_MEM="-Xmx5024m -XX:MaxPermSize=5120m"
然后重启zeppelin
sudo systemctl stop zeppelin; sudo systemctl start zeppelin
结果
INFO [2021-05-04 18:51:02,939] ({main} Folder.java[addNote]:185) - Add note 2G7CAFXX7 to folder /
INFO [2021-05-04 18:51:02,940] ({main} Notebook.java[<init>]:127) - Notebook indexing started...
INFO [2021-05-04 18:51:05,793] ({main} LuceneSearch.java[addIndexDocs]:305) - Indexing 905 notebooks took 2853ms
INFO [2021-05-04 18:51:05,793] ({main} Notebook.java[<init>]:129) - Notebook indexing finished: 905 indexed in -2s
INFO [2021-05-04 18:51:05,795] ({main} Helium.java[loadConf]:103) - Add helium local registry /usr/lib/zeppelin/helium
INFO [2021-05-04 18:51:05,797] ({main} Helium.java[loadConf]:100) - Add helium
INFO [2021-05-04 18:51:06,631] ({main} Server.java[doStart]:407) - Started @131632ms
INFO [2021-05-04 18:51:06,631] ({main} ZeppelinServer.java[main]:249) - Done, zeppelin server started
答案 3 :(得分:-1)
对我来说唯一有用的东西(使用Spark 2)是添加到conf / zeppelin-env.sh:
export SPARK_SUBMIT_OPTIONS="... --driver-memory 4g ..."
然后重新启动Zeppelin解释器(在Zeppelin for Spark 2中,单击右上角的设置按钮,然后单击Interpreter链接,向下滚动并单击Spark部分的Restart按钮)。