PySpark python问题:Py4JJavaError:调用o48.showString时发生错误

时间:2017-12-11 21:32:49

标签: python-3.x pyspark

大家好我正在使用PySpark Python,我已经提到了代码并遇到了一些问题,我想知道是否有人知道以下问题?

import io
with io.open('Workbook2.csv', 'r', encoding='utf8') as infile:
    ipFile = csv.DictReader((x.replace(u"\uFEFF", u" ") for x in infile))
....

这是我的一段代码,它将返回bool值为true false,当我第一次运行此代码时,它工作正常,但重新启动内核后,这就是我收到错误。

windowSpec = Window.partitionBy(df_Broadcast['id']).orderBy(df_Broadcast['id'])
windowSpec

IdShift = lag(df_Broadcast["id"]).over(windowSpec).alias('IdShift')

df_Broadcast = df_Broadcast.withColumn('CheckId', df_Broadcast[idI'] != IdShift)

df_Broadcast.show()

2 个答案:

答案 0 :(得分:8)

错误是

  

引起:java.lang.OutOfMemoryError:Java堆空间

您需要更多内存来执行操作并避免OOM错误。

答案 1 :(得分:0)

此问题是JAVA版本的原因。我有spark 2.3.3和JAVA11。我已经删除了JAVA 11并安装了JAVA8。

问题已解决..

*********@*********-VirtualBox:/opt/java$ sudo update-alternatives --install "/usr/bin/java" "java" "/opt/java/jdk1.8.0_202/"
update-alternatives: --install needs <link> <name> <path> <priority>

Use 'update-alternatives --help' for program usage information.
*********@*********-VirtualBox:/opt/java$ sudo update-alternatives --install "/usr/bin/java" "java" "/opt/java/jdk1.8.0_202/" 1
update-alternatives: using /opt/java/jdk1.8.0_202/ to provide /usr/bin/java (java) in auto mode
*********@*********-VirtualBox:/opt/java$ sudo update-alternatives --install "/usr/bin/java" "java" "/opt/java/jdk1.8.0_202/bin/java" 1
*********@*********-VirtualBox:/opt/java$ sudo update-alternatives --install "/usr/bin/javac" "javac" "/opt/java/jdk1.8.0_202/bin/javac" 1 
update-alternatives: using /opt/java/jdk1.8.0_202/bin/javac to provide /usr/bin/javac (javac) in auto mode
*********@*********-VirtualBox:/opt/java$ java -version

Command 'java' not found, but can be installed with:

sudo apt install default-jre            
sudo apt install openjdk-11-jre-headless
sudo apt install openjdk-8-jre-headless 

*********@*********-VirtualBox:/opt/java$ sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/opt/java/jdk1.8.0_202/bin/javaws" 1
update-alternatives: using /opt/java/jdk1.8.0_202/bin/javaws to provide /usr/bin/javaws (javaws) in auto mode
*********@*********-VirtualBox:/opt/java$ sudo update-alternatives --install "/usr/bin/jar" "jar" "/opt/java/jdk1.8.0_202/bin/jar" 1
update-alternatives: using /opt/java/jdk1.8.0_202/bin/jar to provide /usr/bin/jar (jar) in auto mode
*********@*********-VirtualBox:/opt/java$ sudo update-alternatives --set "java" "/opt/java/jdk1.8.0_202/bin/java"
update-alternatives: using /opt/java/jdk1.8.0_202/bin/java to provide /usr/bin/java (java) in manual mode
*********@*********-VirtualBox:/opt/java$ sudo update-alternatives --set "javac" "/opt/java/jdk1.8.0_202/bin/javac" 
*********@*********-VirtualBox:/opt/java$ sudo update-alternatives --set "javaws" "/opt/java/jdk1.8.0_202/bin/javaws"
*********@*********-VirtualBox:/opt/java$ sudo update-alternatives --set "jar" "/opt/java/jdk1.8.0_202/bin/jar"
*********@*********-VirtualBox:/opt/java$ cd
*********@*********-VirtualBox:~$ java -version
java version "1.8.0_202"
Java(TM) SE Runtime Environment (build 1.8.0_202-b08)
Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)