Question

我正在尝试实现一个简单的Apache Spark RDD系统，但似乎无法访问该会话。

我首先开始做： ./start-all.sh上的/usr/local/spark/sbin

然后我通过执行以下操作创建了一个新会话：

spark = (SparkSession.builder
         .appName("Oncofinder -- Preprocessing")
         .getOrCreate())

dirname = "oncofinder"
zipname = dirname + ".zip"
shutil.make_archive(dirname, 'zip', dirname + "/..", dirname)
spark.sparkContext.addPyFile(zipname)

并将我的应用程序包的新副本发送给Spark工作者。

我正在使用Python库pyspark。

然后，我在名为preprocess的函数上使用了我的spark会话：

train_rdd = preprocess(spark, [1, 2], tile_size=tile_size, sample_size=sample_size,
                       grayscale=grayscale, num_partitions=num_partitions, folder=folder)

和我的功能：

def preprocess(spark, slide_nums, folder="data", training=True, tile_size=1024, overlap=0,
               tissue_threshold=0.9, sample_size=256, grayscale=False, normalize_stains=True,
               num_partitions=20000):

    print("===PREPROCESSING===")

    slides = (spark.sparkContext
              .parallelize(slide_nums)
              .filter(lambda slide: open_slide(slide, folder, training) is not None))

当我运行这段代码时，我得到：

2018-11-27 00:36:30 WARN  Utils:66 - Your hostname, luiscosta-GT62VR-6RD resolves to a loopback address: 127.0.1.1; using 192.168.1.67 instead (on interface wlp2s0)
2018-11-27 00:36:30 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/home/luiscosta/PycharmProjects/wsi_preprocessing/oncofinder/lib/python3.6/site-packages/pyspark/jars/hadoop-auth-2.7.3.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2018-11-27 00:36:30 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
===PREPROCESSING===

它到达了我的===PREPROCESSING===检查点，但没有运行我的open_slide函数。

我是Apache Spark的新手，如果这是一个愚蠢的问题，我深表歉意，但是当我阅读文档时，它看起来非常简单。

亲切的关怀

Apache Spark不会创建新的会话

0 个答案: