Question

我正在尝试处理一个简单的字数统计问题，并试图通过使用map，filter和reduce来完成这项工作。

以下是wordRDD（用于spark的列表）的示例：

myLst = ['cats', 'elephants', 'rats', 'rats', 'cats', 'cats']

我需要的只是计算单词并以元组格式呈现：

counts = [('cat', 1), ('elephant', 1), ('rat', 1), ('rat', 1), ('cat', 1)]

我尝试使用简单的map（）和lambdas作为：

counts = myLst.map(lambdas x: (x, <HERE IS THE PROBLEM>))

我可能错误的语法或可能混淆。 P.S。：这不是一个重复的任务，因为其他答案使用if / else或list comprehensions提供建议。

感谢您的帮助。

Answer 1

不使用lambda但完成工作。

from collections import Counter
c = Counter(myLst)
result = list(c.items())

输出：

In [21]: result
Out[21]: [('cats', 3), ('rats', 2), ('elephants', 1)]

Answer 2

您根本不需要map(..)。您只需reduce(..)

就可以完成

>>> def function(obj, x):
...     obj[x] += 1
...     return obj
...
>>> from functools import reduce
>>> reduce(function, myLst, defaultdict(int)).items()
dict_items([('elephants', 1), ('rats', 2), ('cats', 3)])

然后您可以迭代结果。

但是，有一种更好的方法：查看Counter

Answer 3

如果您不希望为您完成完整缩减步骤（汇总了SuperSaiyan答案中的计数），您可以这样使用地图：

    >>> myLst = ['cats', 'elephants', 'rats', 'rats', 'cats', 'cats']
    >>> counts = list(map(lambda s: (s,1), myLst))
    >>> print(counts)
    [('cats', 1), ('elephants', 1), ('rats', 1), ('rats', 1), ('cats', 1), ('cats', 1)]

Answer 4

您可以使用map（）获得此结果：

myLst = ['cats', 'elephants', 'rats', 'rats', 'cats', 'cats']

list(map(lambda x : (x,len(x)), myLst))

使用Python Lambdas的（键，值）对

4 个答案: