从sparklyr表中提取数据时,我收到了 java.lang.OutOfMemoryError 。我正在大学计算机集群上运行代码,所以它应该有足够的备用内存来从我的1.48Gb数据库中提取一个变量(或者当我通过使用命令collect()收集整个数据库时)。我已经尝试了许多不同的火花配置,如https://github.com/rstudio/sparklyr/issues/379和Running out of heap space in sparklyr, but have plenty of memory中所述,但问题仍然存在。
此外,当我在连接到群集时在终端上键入{
"request": {
"mbean": "*:*",
"type": "search"
},
"value": [
"kafka.network:name=ResponseQueueTimeMs,request=ListGroups,type=RequestMetrics",
"kafka.server:delayedOperation=topic,name=PurgatorySize,type=DelayedOperationPurgatory",
"kafka.server:delayedOperation=Fetch,name=NumDelayedOperations,type=DelayedOperationPurgatory",
"kafka.network:name=RemoteTimeMs,request=Heartbeat,type=RequestMetrics",
<-- SNIP -->
"kafka.network:name=LocalTimeMs,request=Offsets,type=RequestMetrics"
],
"timestamp": 1504188793,
"status": 200
}
时,我得到了
java -version
所以我不认为问题出在Java上,正如How do I configure driver memory when running Spark in local mode via Sparklyr?
中所建议的那样以下是输出文件:
java version "1.7.0_141"
OpenJDK Runtime Environment (rhel-2.6.10.1.el6_9-x86_64 u141-b02)
OpenJDK 64-Bit Server VM (build 24.141-b02, mixed mode)