pandas - 迭代年度一周的索引以获得快速性能的最佳方式

时间:2017-08-04 09:29:29

标签: python loops pandas dataframe

我有一个典型的财务数据框,包含'Date','Time','Open','High','Low','Close','Mean'和'Volume'列,1分钟频率(1.2M行) / df,500 + df)。

我需要逐年迭代这个数据帧,每年一周,几个星期,一周一天。

直到今天我所做的一些代码:

 import os
 import pandas as pd


 for file in os.listdir(data_path):
    if file.endswith('.csv'):

        df = pd.read_csv(data_path + file, parse_dates=[['Date', 'Time']])
        df.columns = ['Timestamp', 'Open', 'High', 'Low', 'Close', 'Volume']  # ranamed the Date_Time column
        df['Timestamp'] = pd.to_datetime(df['Timestamp'])
        df['Mean'] = round(df[['Open', 'High', 'Low', 'Close']].mean(axis=1), 2)

        df['Year'] = [0] * len(df)
        df['Week'] = [0] * len(df)
        df['Day'] = [0] * len(df)
        for i in range(len(df)):
            df['Year'][i] = df['Timestamp'][i].isocalendar()[0]
            df['Week'][i] = df['Timestamp'][i].isocalendar()[1]
            df['Day'][i] = df['Timestamp'][i].isocalendar()[2]

        index = pandas.MultiIndex.from_arrays([df['Year'], df['Week'], df['Day']], names = ['Year','Week','Day'])

        # build a new df from df with index as MultiIndex and save it in .hdf format

然后我使用这个3级索引以下列方式访问带有3个for循环的数据:

# years cycle
years_array = asset.data.index.levels[0].values
for year in years_array:

    # weeks cycle
    weeks_array = np.array(np.unique(asset.data.loc[year].index.labels[0] + 1))
    for week in weeks_array:

        week0 = asset.data.loc[year, week].Open.values
        mean0 = np.mean(week0)

        if week != weeks_array[-1]:
            week1_year = year
            week1_week = week + 1
        elif (week == weeks_array[-1]) & (year != years_array[-1]):
            week1_year = year + 1
            week1_week = 1
        elif (week == weeks_array[-1]) & (year == years_array[-1]):
            break

        # minutes cycle
        week1 = asset.data.loc[week1_year, week1_week].Open.values
        for minute in range(len(week1)):
           # do da magic stuff...

这样做有更明智的方法吗?这是StackOverflow,所以我很确定这是一种更聪明的方式。

根据我当前在当前周中的当前位置(我的代码中的第1周),我真正需要的是轻松获取上周(我的代码中的第0周)数据)。

感谢您的帮助!

0 个答案:

没有答案