将Apache Spark Scala代码转换为Python

时间:2015-06-12 20:30:26

标签: python scala apache-spark

任何人都可以将这个非常简单的scala代码转换为python吗?

val words = Array("one", "two", "two", "three", "three", "three")
val wordPairsRDD = sc.parallelize(words).map(word => (word, 1))

val wordCountsWithGroup = wordPairsRDD
    .groupByKey()
    .map(t => (t._1, t._2.sum))
    .collect()

3 个答案:

答案 0 :(得分:5)

试试这个:

parentDirPath := ExtractFilePath(ExcludeTrailingPathDelimiter(thePath));

答案 1 :(得分:2)

在python中进行两次翻译:

from operator import add
wordsList = ["one", "two", "two", "three", "three", "three"]
words = sc.parallelize(wordsList ).map(lambda l :(l,1)).reduceByKey(add).collect()
print words
words = sc.parallelize(wordsList ).map(lambda l : (l,1)).groupByKey().map(lambda t: (t[0], sum(t[1]))).collect()
print words

答案 2 :(得分:2)

假设您已经定义了Spark上下文并准备好了:

if count >= xyz 

查看github示例repo:Python Examples