我有一个数据框:
df2.head(5)
Out[78]:
User Date movie
0 User1 2019-07-02 [Bridge to Terabithia]
1 User1 2019-07-04 [Defiance]
2 User1 2019-07-05 [Click]
3 User1 2019-07-07 [Big Stan]
4 User1 2019-07-14 [Death at a Funeral]
“电影”列的元素是列表数据类型,现在我正尝试如下运行lambda函数:
df2['movie'] = df2['movie'].apply(lambda x : x[0])
df2.head(5)
Out[79]:
User Date movie
0 User1 2019-07-02 Bridge to Terabithia
1 User1 2019-07-04 NaN
2 User1 2019-07-05 NaN
3 User1 2019-07-07 NaN
4 User1 2019-07-14 NaN
所需的输出是
User Date movie
0 User1 2019-07-02 Bridge to Terabithia
1 User1 2019-07-04 Defiance
2 User1 2019-07-05 Click
3 User1 2019-07-07 Big Stan
4 User1 2019-07-14 Death at a Funeral
无法理解为什么它会给我这样的输出?
答案 0 :(得分:2)
下次请提供一个完全可复制的示例(包括用于创建数据框的代码),这样可以节省所有审阅者的时间。
您的代码对我来说很好:
import pandas as pd
# data
df2 = pd.DataFrame({'User': ['User1'] * 5,
'Date': ['2019-07-02',
'2019-07-04',
'2019-07-05',
'2019-07-07',
'2019-07-14'],
'movie': [
['Bridge to Terabithia'],
['Defiance'],
['Click'],
['Big Stan'],
['Death at a Funeral']
]})
print(df2.head(5))
print()
df2['movie'] = df2['movie'].apply(lambda x : x[0])
print(df2.head(5))
哪种产量:
Date User movie
0 2019-07-02 User1 [Bridge to Terabithia]
1 2019-07-04 User1 [Defiance]
2 2019-07-05 User1 [Click]
3 2019-07-07 User1 [Big Stan]
4 2019-07-14 User1 [Death at a Funeral]
Date User movie
0 2019-07-02 User1 Bridge to Terabithia
1 2019-07-04 User1 Defiance
2 2019-07-05 User1 Click
3 2019-07-07 User1 Big Stan
4 2019-07-14 User1 Death at a Funeral
现在,当我个人想使用Lambda函数调试.apply
时,通常要做的是首先使用常规函数,在该函数中可以放置断点并检查发生了什么。然后,当正确时,将其替换为lambda函数。这就是我在您的情况下要做的事情:
def extract_first(x):
# here you can put breakpoints, print stuff, etc.
return x[0]
df2['movie'] = df2['movie'].apply(extract_first)