我有一个数据框,我正在努力根据其他列创建一个列,我将分享示例数据的问题。
Date Target1 Close
0 2019-04-17 209.2440 203.130005
1 2019-04-17 212.2155 203.130005
2 2019-04-17 213.6330 203.130005
3 2019-04-17 213.0555 203.130005
4 2019-04-17 212.6250 203.130005
5 2019-04-17 212.9820 203.130005
6 2019-04-17 213.1395 203.130005
7 2019-04-16 209.2860 199.250000
8 2019-04-16 209.9055 199.250000
9 2019-04-16 210.3045 199.250000
我想创建另一列(用于每个观察)(例如称为days_to_hit_target
),该列是关闭时(或非常接近特定日期的目标)的匹配天数那么非常紧密,那么它会计算天数的差异并将其放在days_to_hit_target列中。
答案 0 :(得分:2)
注意,我使用python 3.7.1和pandas 0.23.4 。我想到了一件很脏的东西。我确信有一种更整齐,更有效的方法。
### Create sample data
date_range = pd.date_range(start="1/1/2018", end="20/1/2018", freq="6H", closed="right")
target1 = np.random.uniform(10, 30, len(date_range))
close = [[i]*4 for i in np.random.uniform(10,30, len(date_range)//4)]
close_flat = np.array([item for sublist in close for item in sublist])
df = pd.DataFrame(np.array([np.array(date_range.date), target1,
close_flat]).transpose(), columns=["date", "target", "close"])
### Create the column you need
# iterating over the days and finding days when the difference between
# "close" of current day and all "target" is lower than 0.25 OR the "target"
# value is greater than "close" value.
thresh = 0.25
date_diff_arr = np.zeros(len(df))
for i in range(0,len(df),4):
diff_lt_thresh = df[(abs(df.target-df.close.iloc[i]) < thresh) | (df.target > df.close.iloc[i])]
# only keep the findings from the next day onwards
diff_lt_thresh = diff_lt_thresh.loc[i+4:]
if not diff_lt_thresh.empty:
# find day difference only if something under thresh is found
days_diff = (diff_lt_thresh.iloc[0].date - df.iloc[i].date).days
else:
# otherwise write it as nan
days_diff = np.nan
# fill in the np.array which will be used to write to the df
date_diff_arr[i:i+4] = days_diff
df["date_diff"] = date_diff_arr
示例输出:
0 2018-01-01 21.64 26.7319 2.0
1 2018-01-01 22.9047 26.7319 2.0
2 2018-01-01 26.0945 26.7319 2.0
3 2018-01-02 10.2155 26.7319 2.0
4 2018-01-02 17.5602 11.0507 1.0
5 2018-01-02 12.0368 11.0507 1.0
6 2018-01-02 19.5923 11.0507 1.0
7 2018-01-03 21.8168 11.0507 1.0
8 2018-01-03 11.5433 16.8862 1.0
9 2018-01-03 27.3739 16.8862 1.0
10 2018-01-03 26.9073 16.8862 1.0
11 2018-01-04 19.6677 16.8862 1.0
12 2018-01-04 25.3599 27.3373 1.0
13 2018-01-04 22.7479 27.3373 1.0
14 2018-01-04 18.7246 27.3373 1.0
15 2018-01-05 25.4122 27.3373 1.0
16 2018-01-05 28.3294 23.8469 1.0
答案 1 :(得分:2)
这应该有效:
daysAboveTarget = []
for i in range(len(df.Date)):
try:
dayAboveTarget = df.iloc[i:].loc[(df.Close > df.Target1[i])]['Date'].iloc[0]
except IndexError:
dayAboveTarget = None
daysAboveTarget.append(dayAboveTarget)
daysAboveTarget = pd.Series(daysAboveTarget)
df['days_to_hit_target'] = daysAboveTarget - df.Date
我在这里过度使用了iloc和loc,所以让我解释一下。 当价格收盘价高于目标价时,变量dayAboveTarget获取日期。第一个iloc将数据框的子集仅包含未来的日期,第一个loc查找实际结果,第二个iloc仅获取第一个结果。价格在几天内永远都不会超出目标的情况。
答案 2 :(得分:0)
也许是更快的解决方案:
import pandas as pd
# df is your DataFrame
df["Date"] = pd.to_datetime(df["Date"])
df = df.sort_values("Date")
def days_to_hit(x, no_hit_default=None):
return next(
((df["Date"].iloc[j+x.name] - x["Date"]).days
for j in range(len(df)-x.name)
if df["Close"].iloc[j+x.name] >= x["Target1"]), no_hit_default)
df["days_to_hit_target"] = df.apply(days_to_hit, axis=1)