应用错误收集

我相当新兴，我有一个任务是从小字母表中为每个字符的一组推文中获得前100个单词。例如

a: (word1, count1), (word2, count2).. (word100, count100) 
b: (word1, count1), (word2, count2).. (word100, count100) 
.
.
z: (word1, count1), (word2, count2).. (word100, count100)

这是我的代码：

words_mapped = (en_text.flatMap(lambda x: x.split())
                       .filter(lambda x: x[0] in valid_chars )
                       .map(lambda x: (x[0], x)))

这给出了一个带有字符和单词的元组，现在我必须对这些字符进行分组，并在值中查找每个单词的计数，并显示前100个单词的计数。

我怎样才能将其翻译成pyspark。

为每个角色pyspark找到前100个单词

0 个答案: