ImportError:没有名为scipy.stats._continuous_distns的模块

时间:2019-11-19 22:34:10

标签: apache-spark pyspark

我有Spark作业,最后使用saveAsTable将数据帧写入具有给定名称的内部表中。

使用不同的步骤创建数据框,其中之一是在scipy中使用“ beta”方法,在这里我通过=>从scipy.stats导入beta。它在带有20个工作程序节点的Google云上运行,但是出现以下错误,抱怨scipy软件包,

  Caused by: org.apache.spark.SparkException: 
  Job aborted due to stage failure: 
  Task 14 in stage 7.0 failed 4 times, most recent failure: 
  Lost task 14.3 in stage 7.0 (TID 518, name-w-3.c.somenames.internal, 
  executor 23): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 364, in main
  func, profiler, deserializer, serializer = read_command(pickleSer, infile)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 69, in read_command
  command = serializer._read_with_length(file)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 172, in 
  _read_with_length
  return self.loads(obj)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 583, in loads
  return pickle.loads(obj)
  ImportError: No module named scipy.stats._continuous_distns

有什么想法或解决方案吗?

我也尝试通过库来执行spark工作:

"spark.driver.extraLibraryPath" : "/usr/lib/spark/python/lib/pyspark.zip",
"spark.driver.extraClassPath" :"/usr/lib/spark/python/lib/pyspark.zip" 

1 个答案:

答案 0 :(得分:0)

库是否已安装在集群中的所有节点上? 您可以简单地进行

$arr = glob("../images/*.jpg");
$ht = '';
foreach($arr as $el){
    $ht .= "<img class='bimg' src = '" . $el . "' width = 151.11' height = '86.73' alt = 'img' loading = 'lazy'>\n";
}
echo $ht;

我使用引导操作在AWS EMR中进行操作,在Google云上也应采用类似的方式