我正在尝试创建一个修改数据框的函数,然后从修改后的数据框中传回一列。代码如下所示:
def foo(df):
ser = (df[df['bool']].groupby('group')['date'].min())
# Hackish way to merge back to df
serdf = pd.DataFrame(ser, columns=['date_trigger'])
serdf['group'] = ser.index.values
df = pd.merge(df, close_series, how='left', on='group')
return df['trigger_date']
dfFinal['trigger_date'] = foo(dfFinal)
当我在return语句之前在foo中打印df时,它的所有值都在正确的位置并且长度正确。但是,dfFinal ['trigger_date']在很多地方都有NaT。
只是为了了解我正在努力实现的目标。我试图按组选择满足条件的最短日期并将其分配给新列:
Group bool date
A n 2000-01-01
A n 2000-03-02
A y 2000-04-10
A y 2001-01-01
B n 2000-02-20
B y 2000-03-15
B y 2000-04-27
B y 2001-01-10
这将成为:
Group bool date trigger_date
A n 2000-01-01 2000-04-10
A n 2000-03-02 2000-04-10
A y 2000-04-10 2000-04-10
A y 2001-01-01 2000-04-10
B n 2000-02-20 2000-03-15
B y 2000-03-15 2000-03-15
B y 2000-04-27 2000-03-15
B y 2001-01-10 2000-03-15
答案 0 :(得分:2)
首先按Group
分组,然后应用自定义功能:
In [34]: def func(df):
....: df['trigger_date'] = df[df.bool == 'y'].date.min()
....: return df
....:
In [35]: df.groupby('Group').apply(func)
Out[35]:
Group bool date trigger_date
0 A n 2000-01-01 00:00:00 2000-04-10 00:00:00
1 A n 2000-03-02 00:00:00 2000-04-10 00:00:00
2 A y 2000-04-10 00:00:00 2000-04-10 00:00:00
3 A y 2001-01-01 00:00:00 2000-04-10 00:00:00
4 B n 2000-02-20 00:00:00 2000-03-15 00:00:00
5 B y 2000-03-15 00:00:00 2000-03-15 00:00:00
6 B y 2000-04-27 00:00:00 2000-03-15 00:00:00
7 B y 2001-01-10 00:00:00 2000-03-15 00:00:00
答案 1 :(得分:1)
首先,我需要重新创建您的数据:
a = pd.io.parsers.StringIO(""" A n 2000-01-01
A n 2000-03-02
A y 2000-04-10
A y 2001-01-01
B n 2000-02-20
B y 2000-03-15
B y 2000-04-27
B y 2001-01-10""")
b = "Group bool date".split()
d = DataFrame([i.split() for i in a], columns=b)
对于解决方案,如何:
dic = {'y':True, 'n':False}
d['bool'] = d['bool'].apply(lambda x: dic[x])
trigger = d[d['bool']].sort('date').drop_duplicates('Group').drop('bool', axis=1)
d = d.merge(trigger, how='left', on='Group', suffixes=['','_trigger'])
修改强>
OP需要系列,并且相同的索引与原始DataFrame相同。所以我复制了@ waitingkuo的groupby函数,并根据OP的需要调整了答案。我希望有人能够用更惯用的方式来解决这个问题!
def trigger(df):
def min_y(d):
return d[d['bool'] == 'y'].date.min()
dt = df.groupby('Group').apply(min_y)
dt = DataFrame(dt, columns=['trigger_date']).reset_index()
ix = df.index.copy(deep=True)
df = df.merge(dt, how='left', on='Group')
ser = df['trigger_date']
ser.index = ix
return ser