Question

我正在使用以下代码玩PySpark：

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Scoring System").getOrCreate()

df = spark.read.csv('output.csv')

df.show()

在命令行上运行python trial.py之后，大约5至10分钟，没有任何进展：

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2019-05-05 22:58:31 WARN  Utils:66 - Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
2019-05-05 22:58:32 WARN  Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
[Stage 0:>                                                          (0 + 0) / 1]2019-05-05 23:00:08 WARN  YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-05-05 23:00:23 WARN  YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-05-05 23:00:38 WARN  YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-05-05 23:00:53 WARN  YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
[Stage 0:>                                                          (0 + 0) / 1]2019-05-05 23:01:08 WARN  YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-05-05 23:01:23 WARN  YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-05-05 23:01:38 WARN  YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

我很想知道自己的工作节点（？）中缺少资源，还是缺少什么？

Answer 1

尝试增加执行者和记忆的数量 pyspark --num-executors 5 --executor-memory 1G

PySpark连接慢

1 个答案: