Lambda对熊猫的行为无法理解

时间:2019-09-01 06:12:07

标签: python pandas dataframe lambda

我有一个数据框:

df2.head(5)
Out[78]: 
    User        Date                   movie
0  User1  2019-07-02  [Bridge to Terabithia]
1  User1  2019-07-04              [Defiance]
2  User1  2019-07-05                 [Click]
3  User1  2019-07-07              [Big Stan]
4  User1  2019-07-14    [Death at a Funeral]

“电影”列的元素是列表数据类型,现在我正尝试如下运行lambda函数:

df2['movie'] = df2['movie'].apply(lambda x : x[0])

df2.head(5)
Out[79]: 
    User        Date               movie
0  User1  2019-07-02 Bridge to Terabithia
1  User1  2019-07-04                 NaN
2  User1  2019-07-05                 NaN
3  User1  2019-07-07                 NaN
4  User1  2019-07-14                 NaN

所需的输出是

    User        Date               movie
0  User1  2019-07-02              Bridge to Terabithia
1  User1  2019-07-04              Defiance
2  User1  2019-07-05              Click
3  User1  2019-07-07              Big Stan
4  User1  2019-07-14              Death at a Funeral

无法理解为什么它会给我这样的输出?

1 个答案:

答案 0 :(得分:2)

下次请提供一个完全可复制的示例(包括用于创建数据框的代码),这样可以节省所有审阅者的时间。

您的代码对我来说很好:

import pandas as pd

# data
df2 = pd.DataFrame({'User': ['User1'] * 5,
                    'Date': ['2019-07-02',
                             '2019-07-04',
                             '2019-07-05',
                             '2019-07-07',
                             '2019-07-14'],
                    'movie': [
                        ['Bridge to Terabithia'],
                        ['Defiance'],
                        ['Click'],
                        ['Big Stan'],
                        ['Death at a Funeral']
                    ]})

print(df2.head(5))
print()

df2['movie'] = df2['movie'].apply(lambda x : x[0])
print(df2.head(5))

哪种产量:

         Date   User                   movie
0  2019-07-02  User1  [Bridge to Terabithia]
1  2019-07-04  User1              [Defiance]
2  2019-07-05  User1                 [Click]
3  2019-07-07  User1              [Big Stan]
4  2019-07-14  User1    [Death at a Funeral]

         Date   User                 movie
0  2019-07-02  User1  Bridge to Terabithia
1  2019-07-04  User1              Defiance
2  2019-07-05  User1                 Click
3  2019-07-07  User1              Big Stan
4  2019-07-14  User1    Death at a Funeral

现在,当我个人想使用Lambda函数调试.apply时,通常要做的是首先使用常规函数,在该函数中可以放置断点并检查发生了什么。然后,当正确时,将其替换为lambda函数。这就是我在您的情况下要做的事情:

def extract_first(x):
    # here you can put breakpoints, print stuff, etc.
    return x[0]

df2['movie'] = df2['movie'].apply(extract_first)