无法运行程序“ python”:error = 2,通过gitlab ci / cd管道运行pyspark测试时,没有此类文件或目录

时间:2019-10-03 18:04:33

标签: pyspark continuous-integration gitlab

我正在使用Gitlab CI / CD管道运行pyspark单元测试用例,并且遇到上述错误,以下是gitlab-ci.yml测试阶段的内容:-

stages:
    - lint
    - unit
    - deploy

.test_template: &test_definition
    stage: unit
    before_script:
        - apk add --update --no-cache curl gcc linux-headers make musl-dev
        - pip install pipenv
        - pipenv install --dev --skip-lock
    except:
        - /.*working.*/
unit:
    <<: *test_definition
    tags:
        - docker
    script:
        - apk add --no-cache bash libc6-compat openjdk8 tar wget
        - ln -s /lib64/ld-linux-x86-64.so.2 /lib/ld-linux-x86-64.so.2
        - wget -q https://archive.apache.org/dist/spark/spark-2.4.2/spark-2.4.2-bin-hadoop2.7.tgz
        - tar -xzf spark-2.4.2-bin-hadoop2.7.tgz --directory /opt
        - export SPARK_HOME=/opt/spark-2.4.2-bin-hadoop2.7
        - export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH
        - export AWS_DEFAULT_REGION=us-east-1
        - pipenv run python -m unittest discover
    except:
        - /.*working.*/

使用上述管道配置运行pyspark测试会引发以下错误:-

java.io.IOException: Cannot run program "python": error=2, No such file or directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
    at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:197)
    at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:122)
    at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:95)
    at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
    at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:108)
    at org.apache.spark.sql.execution.python.BatchEvalPythonExec.evaluate(BatchEvalPythonExec.scala:77)
    at org.apache.spark.sql.execution.python.EvalPythonExec.$anonfun$doExecute$2(EvalPythonExec.scala:128)
    at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:801)
    at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:801)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=2, No such file or directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
    at java.lang.ProcessImpl.start(ProcessImpl.java:134)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
    ... 27 more

0 个答案:

没有答案