Question

我需要帮助来理解以下代码。

df_all['search_term'] = df_all['search_term'].map(lambda x:str_stemmer(x))

完整代码的链接： https://www.kaggle.com/wenxuanchen/home-depot-product-search-relevance/sklearn-random-forest/code
谢谢。

Answer 1

我查看了其他问题但他们似乎并没有解释你的问题 - map函数有什么作用？

map接受一个iterable和一个函数，并依次将该函数应用于iterable中的每个元素。

以下是一个例子：

def square_the_things(value):
    print('Squaring {}'.format(value))
    return value * value


items = [1,2,3,4,5]
squared_items = map(square_the_things, items)

for squared in squared_items:
    print('Squared item is: {}'.format(squared))

<强>输出

Squaring 1
Squared item is: 1
Squaring 2
Squared item is: 4
Squaring 3
Squared item is: 9
Squaring 4
Squared item is: 16
Squaring 5
Squared item is: 25

请注意，我们已将该功能的名称传递给()，但最后没有map。 lambda只是一个没有名字的函数。在你的情况下，你实际上可以传入.map(str_stemmer)，因为它只需要一个参数。

通过我的示例，您可以看到第一个输出来自函数 - Squaring 1。然后它经历循环的第一次迭代并显示Squared item is: 1。那是因为我使用Python3而map是一个迭代器。在Python2中，它输出不同的东西：

Squaring 1
Squaring 2
Squaring 3
Squaring 4
Squaring 5
Squared item is: 1
Squared item is: 4
Squared item is: 9
Squared item is: 16
Squared item is: 25

那是因为它将函数应用于可迭代的 first 并生成一个列表。

Answer 2

pandas.Series.map与Python的地图有点不同。

假设您有一个小字典，其中包含一些常用单词的根源：

roots_dict = {"going": "go", "went": "go", "took": "take", "does": "do", 
              "thought": "think", "came": "come", "begins": "begin"}

您还有一个pandas DataFrame，在该数据框中，您有一列单词：

df = pd.DataFrame({"word": ["took", "gone", "done", "begins", "came",
                            "thought", "took", "went"]})

      word
0     took
1     gone
2     done
3   begins
4     came
5  thought
6     took
7     went

如果您想要一个显示这些单词根的附加列，可以使用map。对于该系列（列）中的每个元素，如果该单词作为字典中的键存在，则map会检查字典。如果是，则返回该值;否则返回NaN：

df["root"] = df["word"].map(roots_dict)

      word   root
0     took   take
1     gone    NaN
2     done    NaN
3   begins  begin
4     came   come
5  thought  think
6     took   take
7     went     go

您也可以传递一个系列，而不是字典。在这种情况下，它会检查系列的索引。

在您的示例中，它适用于函数。该函数旨在获取一个字符串（可能包含几个单词），将其转换为全部小写，将其拆分为单词并将NLTK的Snawball Stemmer应用于每个单词。因此，对于df_all['search_term'].map(lambda x: str_stemmer(x))，“search_term”列中的每一行（x是该行中的字符串）都是str_stemmer()的输入。 .map结合了该函数返回的元素，并返回另一个系列，其中包含所有单词的根。

有人可以解释一下地图功能的作用吗？

2 个答案: