Question

大量编辑：

好的，所以我在分钟级别有一个时间序列数据帧。例如，此数据框是一年的数据。我正在尝试创建一个日常迭代这些数据的分析模型。

该功能将： 1）从数据框中分割一天的数据。 2）创建每日切片的30分钟（一天的前30分钟）子切片。 3）将两个切片中的数据传递到函数的分析部分。 4）附加到新数据帧。 5）继续迭代直到完成。

数据框采用以下格式：

                           open_price high  low   close_price volume     price
2015-01-06 14:31:00+00:00   46.3800 46.440  46.29   46.380  560221.0    46.380
2015-01-06 14:32:00+00:00   46.3800 46.400  46.30   46.390  52959.0     46.390
2015-01-06 14:33:00+00:00   46.3900 46.495  46.36   46.470  100100.0    46.470
2015-01-06 14:34:00+00:00   46.4751 46.580  46.41   46.575  85615.0     46.575
2015-01-06 14:35:00+00:00   46.5800 46.610  46.53   46.537  78175.0     46.537

在我看来，pandas datetimeindex功能是执行此任务的最佳方式，但我不知道从哪里开始。

（1）似乎我可以使用.rollforward功能，从df开始日期/时间开始，并在每次迭代中前滚一天。

（2）使用df.loc [mask]创建子标题。

我相当肯定我可以在（2）之后弄明白，但我再一次对时间序列分析或pandas datetimeindex功能不太熟悉。

最终数据框：

              high     low   retrace  time
2015-01-06    46.440  46.29  True     47
2015-01-07    46.400  46.30  True     138
2015-01-08    46.495  46.36  False    NaN
2015-01-09    46.580  46.41  True     95
2015-01-10    46.610  46.53  False    NaN

高=一天前30分钟的高点

低=当天前30分钟的低点

Retrace = Boolean，如果价格在前30分钟后的某一时刻返回到开盘价。

时间=回溯所花费的时间（分钟）。

这里的代码似乎有用（感谢所有人的帮助！）：

sample = msft_prices.ix[s_date:e_date]
sample = sample.resample('D').mean() 
sample = sample.dropna()
sample = sample.index.strftime('%Y-%m-%d')
ORTDF = pd.DataFrame()
ORDF = pd.DataFrame()
list1 = []
list2 = []
def hi_lo(prices):

        for i in sample:
            list1 = []
            if i in prices.index:

                ORTDF = prices[i+' 14:30':i+' 15:00']
                ORH = max(ORTDF['high']) #integer value
                ORHK = ORTDF['high'].idxmax()
                ORL = min(ORTDF['low']) #integer value
                ORLK = ORTDF['low'].idxmin()
                list1.append(ORH)
                list1.append(ORL)



                if ORHK < ORLK:
                    dailydf = prices[i+' 14:30':i+' 21:00']
                    if max(dailydf['high']) > ORH:
                        ORDH = max(dailydf['high'])
                        ORDHK = dailydf['high'].idxmax()
                        touched = 1
                        time_to_touch = ORDHK - ORHK
                        time_to_touch = time_to_touch.total_seconds() / 60
                        list1.append(touched)
                        list1.append(time_to_touch)
                        list2.append(list1)
                    else:
                        touched = 0
                        list1.append(touched)
                        list1.append('NaN')
                        list2.append(list1)
                elif ORHK > ORLK:
                    dailydf = prices[i+' 14:30':i+' 21:00']
                    if min(dailydf['low']) < ORL:
                        ORDL = min(dailydf['low'])
                        ORDLK = dailydf['low'].idxmin()
                        touched = 1
                        time_to_touch = ORDLK - ORLK
                        time_to_touch = time_to_touch.total_seconds() / 60
                        list1.append(touched)
                        list1.append(time_to_touch)
                        list2.append(list1)
                    else:
                        touched = 0
                        list1.append(touched)
                        list1.append('NaN')
                        list2.append(list1)


            else:
                pass


        ORDF = pd.DataFrame(list2, columns=['High', 'Low', 'Retraced', 'Time']).set_index([sample])
        return ORDF

这可能不是最优雅的方式，但嘿，它有效！

Answer 1

阅读the docs以获取一般参考资料

设置（下次请在问题中自行提供！）：

dates = pd.to_datetime(['19 November 2010 9:01', '19 November 2010 9:02', '19 November 2010 9:03',
                       '20 November 2010 9:05', '20 November 2010 9:06', '20 November 2010 9:07'])
df = pd.DataFrame({'low_price': [1.2, 1.8, 1.21, 2., 4., 1.201],  
                  'high_price': [3., 1.8, 1.21, 4., 4.01, 1.201]}, index=dates)
df

                    high_price  low_price
2010-11-19 09:01:00     3.000   1.200
2010-11-19 09:02:00     1.800   1.800
2010-11-19 09:03:00     1.210   1.210
2010-11-20 09:05:00     4.000   2.000
2010-11-20 09:06:00     4.010   4.000
2010-11-20 09:07:00     1.201   1.201

我将按日分组，然后每天应用一个函数来计算是否有回溯以及发生的时间段。你的问题不明确在哪个栏目上运作或者说“价格是否相同”的容忍程度是什么，所以我把它们作为选项

def retrace_per_day(day, col='high_price', epsilon=0.5):
    """take day data and returns whether there was a retrace.
    If yes, return 1 and the minute in which it did.
    Otherwise return 0 and np.nan"""
    cond = (np.abs(day[col] - day[col][0]) < epsilon)
    cond_index = cond[cond].index
    if len(cond_index) > 1:
        retrace, period = 1, cond_index[1]
    else:
        retrace, period = 0, np.nan
    return pd.Series({'retrace': retrace, 'period' : period})

df.groupby(pd.TimeGrouper('1D')).apply(retrace_per_day)

           period   retrace
2010-11-19  NaN     0.0
2010-11-20  2010-11-20 09:06:00     1.0

然后，您可以根据需要将其合并回原始数据框。

一次一天地通过时间序列迭代

1 个答案: