我有两个数据框df
和times
,分别代表维护记录和每月时间。我想根据times
中的数据向df
追加一列:
#df represents car maintenance records
data = {"07-18-2012": ["replaced wheels", 45, 200], "09-12-2014": ["changed oil", 30, 40], "09-18-2015": ["fixed dent", 92, 0]}
df = pd.DataFrame.from_dict(data, orient = "index")
df.index = pd.to_datetime(df.index)
df.sort_index(inplace = True)
df.columns = ["description", "mins_spent", "cost"]
#times represents monthly periods
rng = pd.date_range(start = '12/31/2013', end = '1/1/2015', freq='M')
ts = pd.Series(rng)
times = ts.to_frame(name = "months")
我正在尝试向days_since_maintenance
添加名为times
的新列,该列代表自df
我尝试使用df.ix[]
,迭代for loop
和searchsorted()
。
df
description mins_spent cost
2012-07-18 replaced wheels 45 200
2014-09-12 changed oil 30 40
2015-09-18 fixed dent 92 0
times
months
0 2013-12-31
1 2014-01-31
2 2014-02-28
3 2014-03-31
4 2014-04-30
5 2014-05-31
6 2014-06-30
7 2014-07-31
8 2014-08-31
9 2014-09-30
10 2014-10-31
11 2014-11-30
12 2014-12-31
所需的DataFrame:
months days_since_maintenance
0 2013-12-31 531 days
1 2014-01-31 562 days
2 2014-02-28 ...
3 2014-03-31 ...
4 2014-04-30 ...
5 2014-05-31 ...
6 2014-06-30 ...
7 2014-07-31 ...
8 2014-08-31 774 days
9 2014-09-30 18 days
10 2014-10-31 ...
11 2014-11-30 ...
12 2014-12-31 ...
答案 0 :(得分:1)
apply
有效,但使用index
而不是为dates
添加新列:
def days_since_x(row, df):
'''returns days between the row date
and the most recent maintenance date in df'''
#filter records
all_maint_prior = df[(df.index <= row)]
if all_maint_prior.empty:
return float('NaN')
else:
#get last row of filtered results
most_recent = all_maint_prior.iloc[-1]
#return difference in dates
return row-most_recent.name
times["days_since_maintenance"] = times["months"].apply(lambda row: days_since_x (row,df))
答案 1 :(得分:0)
嗯,这绝对不是最好的解决方案,因为它会遍历您的df.index
:
for d in df.index:
times.ix[times['months'] >= d, 'days_since_maintenance'] = times['months'] - d
In [123]: times
Out[123]:
months days_since_maintenance
0 2013-12-31 531 days
1 2014-01-31 562 days
2 2014-02-28 590 days
3 2014-03-31 621 days
4 2014-04-30 651 days
5 2014-05-31 682 days
6 2014-06-30 712 days
7 2014-07-31 743 days
8 2014-08-31 774 days
9 2014-09-30 18 days
10 2014-10-31 49 days
11 2014-11-30 79 days
12 2014-12-31 110 days
答案 2 :(得分:0)
df['dates'] = df.index
def days_from_closest(x, df):
closest = df[df['dates'] < x].ix[-1]
return x - closest.dates
times['days_since_maintenance'] = times['months'].apply(lambda x: days_from_closest(x, df))
months days_since_maintenance
0 2013-12-31 531 days
1 2014-01-31 562 days
2 2014-02-28 590 days
3 2014-03-31 621 days
4 2014-04-30 651 days
5 2014-05-31 682 days
6 2014-06-30 712 days
7 2014-07-31 743 days
8 2014-08-31 774 days
9 2014-09-30 18 days
10 2014-10-31 49 days
11 2014-11-30 79 days
12 2014-12-31 110 days
[13行x 2列]