Question

我有70,000个元素，元素必须用于无监督的learnii也尝试使用ding作业，我尝试使用K-means和Bisecting K-means使作业运行，我使用TF-IDF创建输入RDD对于算法，我也尝试使用降维减量器，特别是SVD和PCA，但减速器不起作用，我总是遇到堆空间问题，这是我的配置：

spark-submit \
--class myclass \
--master yarn \
--deploy-mode cluster \
--driver-memory 1000mb \
--executor-memory 1000mb \
--num-executors 5 \
--driver-java-options "-Dcommons.config.resource=configs/homolog-       jobs.properties -Dhbase.build.file.location=PATH" 
--conf 'spark.executor.extraJavaOptions=- Dcommons.config.resource=configs/homolog-jobs.properties -Dhbase.build.file.location=PATH' \
--conf 'spark.driver.extraJavaOptions=-Dcommons.config.resource=configs/homolog-jobs.properties -Dhbase.build.file.location=PATH' \
/path to my jar \
10 (number of cluster)

spark中的可扩展聚类算法

0 个答案: