我正在尝试实现一个简单的Apache Spark RDD系统,但似乎无法访问该会话。
我首先开始做:
./start-all.sh
上的/usr/local/spark/sbin
然后我通过执行以下操作创建了一个新会话:
spark = (SparkSession.builder
.appName("Oncofinder -- Preprocessing")
.getOrCreate())
dirname = "oncofinder"
zipname = dirname + ".zip"
shutil.make_archive(dirname, 'zip', dirname + "/..", dirname)
spark.sparkContext.addPyFile(zipname)
并将我的应用程序包的新副本发送给Spark工作者。
我正在使用Python库pyspark。
然后,我在名为preprocess的函数上使用了我的spark会话:
train_rdd = preprocess(spark, [1, 2], tile_size=tile_size, sample_size=sample_size,
grayscale=grayscale, num_partitions=num_partitions, folder=folder)
和我的功能:
def preprocess(spark, slide_nums, folder="data", training=True, tile_size=1024, overlap=0,
tissue_threshold=0.9, sample_size=256, grayscale=False, normalize_stains=True,
num_partitions=20000):
print("===PREPROCESSING===")
slides = (spark.sparkContext
.parallelize(slide_nums)
.filter(lambda slide: open_slide(slide, folder, training) is not None))
当我运行这段代码时,我得到:
2018-11-27 00:36:30 WARN Utils:66 - Your hostname, luiscosta-GT62VR-6RD resolves to a loopback address: 127.0.1.1; using 192.168.1.67 instead (on interface wlp2s0)
2018-11-27 00:36:30 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/home/luiscosta/PycharmProjects/wsi_preprocessing/oncofinder/lib/python3.6/site-packages/pyspark/jars/hadoop-auth-2.7.3.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2018-11-27 00:36:30 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
===PREPROCESSING===
它到达了我的===PREPROCESSING===
检查点,但没有运行我的open_slide
函数。
我是Apache Spark的新手,如果这是一个愚蠢的问题,我深表歉意,但是当我阅读文档时,它看起来非常简单。
亲切的关怀