我试图选择使用多个日期,并根据基于这两个日期的价格最大值将值分配给列。如果有人指出这是最快的方法,可能会有所帮助。
我尝试了这段代码,但是它创建了新行,并且不会更改现有行。
def updateRecord(dfIn, starDate, endDate):
mask = (dfIn['date'] <= endDate) & (dfIn['date'] >= startDate)
new_df = dfIn.loc[mask]
if len(new_df) == 0:
return dfIn
dfIn.loc[dfIn.loc[mask].price.max(), 'highest'] = 1
dfIn.loc[dfIn.loc[mask].price.min(), 'lowest'] = 1
return dfIn
date price highest lowest
2000-05-01 04:00:00 4.439730 0 0
2000-05-02 04:00:00 4.209830 0 0
2000-05-03 04:00:00 4.109380 0 0
2000-05-04 04:00:00 3.953130 0 0
2000-05-05 04:00:00 4.040180 0 0
2000-05-08 04:00:00 3.933040 0 0
2000-05-09 04:00:00 3.765630 0 0
2000-05-10 04:00:00 3.546880 0 0
2000-05-11 04:00:00 3.671880 0 0
2000-05-12 04:00:00 3.843750 0 0
2000-05-15 04:00:00 3.607150 0 0
2000-05-16 04:00:00 3.774560 0 0
2000-05-17 04:00:00 3.620540 0 0
2000-05-18 04:00:00 3.598220 0 0
2000-05-19 04:00:00 3.357150 0 0
2000-05-22 04:00:00 3.212060 0 0
2000-05-23 04:00:00 3.064740 0 0
2000-05-24 04:00:00 3.131700 0 0
2000-05-25 04:00:00 3.116630 0 0
2000-05-26 04:00:00 3.084830 0 0
2000-05-30 04:00:00 3.127230 0 0
2000-05-31 04:00:00 3.000000 0 0
2000-06-01 04:00:00 3.183040 0 0
2000-06-02 04:00:00 3.305810 0 0
.....
2000-06-30 04:00:00 3.261160 0 0
理想的结果应该是应按以下方式更新行:
df = updateRecord(df, '2000-05-01 04:00:00', '2000-05-31 04:00:00')
df output should be:
2000-05-01 04:00:00 4.439730 1 0
2000-05-31 04:00:00 3.000000 0 1
我当前的代码创建一个新行,而不是更新现有行。
答案 0 :(得分:1)
我确信这不是最好的方法。
def updateRecord(dfIn, starDate, endDate):
df_o = dfIn.loc[(dfIn['date'] <= endDate) & (dfIn['date'] >= startDate)]
if len(df_o) == 0:
return dfIn
# What is supposed to happen if len(df_o) > 0?
idx = df_o['price'].argmax()
df_o.at[idx,'highest'] = 1
idx_l = df_o['price'].argmin()
df_o.at[idx_l,'lowest'] = 1
return df_o
希望它能起作用。
答案 1 :(得分:0)
这有效,但是会带来所选的DataFrame。如果您希望获得相同的功能,但要带上整个DataFrame,我也可以做到。
firefox_custom: {
base: 'Firefox',
prefs: {
'toolkit.telemetry.reportingpolicy.firstRun': false,
}
},
答案 2 :(得分:0)
我认为您正在寻找这个。
def updateRecord(dfIn, starDate, endDate):
mask = (dfIn['date'] <= endDate) & (dfIn['date'] >= startDate)
if sum(mask) == 0:
return dfIn
# You want the argmax[min] for the given mask, not the entire DF, as you stated.
dfIn.loc[dfIn.loc[mask, 'price'].argmax(), 'highest'] = 1
dfIn.loc[dfIn.loc[mask, 'price'].argmin(), 'lowest'] = 1
return dfIn