我有一个典型的财务数据框,包含'Date','Time','Open','High','Low','Close','Mean'和'Volume'列,1分钟频率(1.2M行) / df,500 + df)。
我需要逐年迭代这个数据帧,每年一周,几个星期,一周一天。
直到今天我所做的一些代码:
import os
import pandas as pd
for file in os.listdir(data_path):
if file.endswith('.csv'):
df = pd.read_csv(data_path + file, parse_dates=[['Date', 'Time']])
df.columns = ['Timestamp', 'Open', 'High', 'Low', 'Close', 'Volume'] # ranamed the Date_Time column
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df['Mean'] = round(df[['Open', 'High', 'Low', 'Close']].mean(axis=1), 2)
df['Year'] = [0] * len(df)
df['Week'] = [0] * len(df)
df['Day'] = [0] * len(df)
for i in range(len(df)):
df['Year'][i] = df['Timestamp'][i].isocalendar()[0]
df['Week'][i] = df['Timestamp'][i].isocalendar()[1]
df['Day'][i] = df['Timestamp'][i].isocalendar()[2]
index = pandas.MultiIndex.from_arrays([df['Year'], df['Week'], df['Day']], names = ['Year','Week','Day'])
# build a new df from df with index as MultiIndex and save it in .hdf format
然后我使用这个3级索引以下列方式访问带有3个for循环的数据:
# years cycle
years_array = asset.data.index.levels[0].values
for year in years_array:
# weeks cycle
weeks_array = np.array(np.unique(asset.data.loc[year].index.labels[0] + 1))
for week in weeks_array:
week0 = asset.data.loc[year, week].Open.values
mean0 = np.mean(week0)
if week != weeks_array[-1]:
week1_year = year
week1_week = week + 1
elif (week == weeks_array[-1]) & (year != years_array[-1]):
week1_year = year + 1
week1_week = 1
elif (week == weeks_array[-1]) & (year == years_array[-1]):
break
# minutes cycle
week1 = asset.data.loc[week1_year, week1_week].Open.values
for minute in range(len(week1)):
# do da magic stuff...
这样做有更明智的方法吗?这是StackOverflow,所以我很确定这是一种更聪明的方式。
根据我当前在当前周中的当前位置(我的代码中的第1周),我真正需要的是轻松获取上周(我的代码中的第0周)数据)。
感谢您的帮助!