我试图浏览每一行(iterrow?)并找到最近的日期(排序函数?)并将其放在列G' G'
我在组合迭代功能和排序功能时遇到了麻烦。
Name
期望输出
A B C D E F G
0 1 20171018 20171019 20171001 20171002 id_123
1 2 NaN 20171005 20171006 20171003 id_234
2 3 NaN NaN 20171019 20171020 id_345
3 4 NaN NaN NaN 20171021 id_456
以下是生成数据框的代码
A B C D E F G
0 1 20171018 20171019 20171001 20171002 id_123 20171019
1 2 NaN 20171005 20171006 20171003 id_234 20171006
2 3 NaN NaN 20171019 20171020 id_345 20171020
3 4 NaN NaN NaN 20171021 id_456 20171021
编辑:我已经使用datetime
转换了日期列答案 0 :(得分:3)
您可以在数据框上使用.max()
方法来获取最新日期。您需要传递参数axis=1
以使其计算每行的最大值。
import pandas as pd
data = {'A': [1, 2, 3, 4],
'B': ['20171018', '', '', ''],
'C': ['20171019', '20171005', '', ''],
'D': ['20171001', '20171006', '20171019', ''],
'E': ['20171002', '20171003', '20171020', '20171021'],
'F': ['id_123','id_234','id_345','id_456']
}
df = pd.DataFrame(data)
# convert to datetimes
for c in 'BCDE':
df[c] = pd.to_datetime(df[c])
# create a new column
df['G'] = df[['B','C','D','E']].max(axis=1)
print(df)
A B C D E F G
0 1 2017-10-18 2017-10-19 2017-10-01 2017-10-02 id_123 2017-10-19
1 2 NaT 2017-10-05 2017-10-06 2017-10-03 id_234 2017-10-06
2 3 NaT NaT 2017-10-19 2017-10-20 id_345 2017-10-20
3 4 NaT NaT NaT 2017-10-21 id_456 2017-10-21