我有一个包含以下数据的数据框:
# Can you imagine if I had to do this to install Django?
pip install microsoft
我想创建一个新专栏' event_date'来自' invoice_data'基于现有列的条件。 条件是:
1)选择最大值' invoice_date'这被定义为最新日期和
2)选择' change_status'其中' change_status' ==' A'或者' change_status' ==' U'
结果数据框应如下所示:
id|invoice_no|invoice_date|change_previous_month|change_status
984974|110|2016-12-31|0|A
984974|8202|2017-01-30|-64864|D
115677|5505|2016-12-31|0|A
115677|5635|2017-01-30|58730|U
event_date应该在invoice_date中创建,并且满足上述两个条件,请提前感谢您的帮助。
答案 0 :(得分:2)
我认为你需要:
isin
boolean indexing
sort_values
列invoice_date
id
的最后一行drop_duplicates
set_index
for create Series
map
id
到新列s = (df[df['change_status'].isin(['A','U'])]
.sort_values('invoice_date')
.drop_duplicates('id', keep='last')
.set_index('id')['invoice_date'])
df['event_date'] = df['id'].map(s)
print (df)
id invoice_no invoice_date change_previous_month change_status \
0 984974 110 2016-12-31 0 A
1 984974 8202 2017-01-30 -64864 D
2 115677 5505 2016-12-31 0 A
3 115677 5635 2017-01-30 58730 U
event_date
0 2016-12-31
1 2016-12-31
2 2017-01-30
3 2017-01-30