对于上下文,我的主数据集是一个24541行x 1830列的DataFrame,其中充满了NaN或浮点数(股价)。我正在处理此DataFrame 11次,每次都在具有相同索引和列的转换DataFrame中设置值。这两个DataFrame的示例如下:
data = pd.DataFrame.from_csv(filepath)
data = pd.DataFrame(data=data, dtype=np.float64)
#dataset of daily prices
data.head()
Out[14]:
49154 65541 32791 65568 ... 24563 81910 24571 90110
DATE ...
1925-12-31 NaN NaN NaN NaN ... NaN NaN NaN NaN
1926-01-02 NaN NaN NaN NaN ... NaN NaN NaN NaN
1926-01-04 NaN NaN NaN NaN ... NaN NaN NaN NaN
1926-01-05 NaN NaN NaN NaN ... NaN NaN NaN NaN
1926-01-06 NaN NaN NaN NaN ... NaN NaN NaN NaN
[5 rows x 1830 columns]
MA_a_frame = pd.DataFrame(
data=0,
index=data.index,
columns=data.columns)
#bool DataFrame
MA_a_frame.head()
Out[15]:
49154 65541 32791 65568 ... 24563 81910 24571 90110
DATE ...
1925-12-31 0 0 0 0 ... 0 0 0 0
1926-01-02 0 0 0 0 ... 0 0 0 0
1926-01-04 0 0 0 0 ... 0 0 0 0
1926-01-05 0 0 0 0 ... 0 0 0 0
1926-01-06 0 0 0 0 ... 0 0 0 0
[5 rows x 1830 columns]
如果满足数据框“数据”中的特定条件,则将MA_a_frame(和其他10个相同的数据框)中的值设置为1。即,如果“数据”中的价格在上一个函数中生成的完全不同 DataFrame中的计算值的1%以内(参数为“ j”)。因此,总计而言,每个迭代最多可处理3个大型DataFrame。
就我的迭代器而言,我仅使用data.columns和data.index创建两个单独的列表(“日期”和“安全性”)。因此,我实质上是在间接地遍历数据的索引和列。事不宜迟,这是我的程序中总共运行11次的代码基础(这是我要加速的部分!):
def gen_a():
for date in dates:
for security in securities:
try:
if type(data.loc[date, security]) is not float:
pass
#lots of the data is NaN, so skip these altogether
elif j > math.log(
MA_a_csv.loc[date, security]/
data.loc[date, security]) > -j:
MA_dict['a'].loc[date, security] = 1
print(f'Passed {date}, {security}')
except:
print(f'Failed {date}, {security}')
现在,问题在于此代码的一个周期需要大约8个小时。因此,我每次运行将近90个小时。我有一份学术论文作为毕业要求,而截止日期真的让这些数字吓到我了!假设我的输出是完美的,一切都会好起来的,但是如果有人提出降低速度的建议,我将永远感激不已。否则,我可能不得不缩减数据范围,从而降低了统计分析的能力。
P.S。我正在使用Intel i7 3970X在Windows 10上通过Spyder运行它。我没有其他任何计算能力。我考虑过GPU加速,但是我的GPU是GTX 670,它不是Pascal,因此与CuDF不兼容。
编辑:
这是数据DataFrame的底部五行:
s.head()
Out[16]:
49154 65541 32791 65568 ... 24563 81910 24571 90110
DATE ...
2018-12-24 61.55 232.70000 NaN NaN ... NaN 15.71 NaN NaN
2018-12-26 65.11 244.59000 NaN NaN ... NaN 16.48 NaN NaN
2018-12-27 64.71 252.17999 NaN NaN ... NaN 16.71 NaN NaN
2018-12-28 64.96 249.64999 NaN NaN ... NaN 16.55 NaN NaN
2018-12-31 66.09 254.50000 NaN NaN ... NaN 16.74 NaN NaN
[5 rows x 1830 columns]
这是一个比较DataFrames的示例:
Out[23]:
49154 65541 32791 65568 ... 24563 81910 24571 90110
DATE ...
2018-12-24 76.3430 258.376200 NaN NaN ... NaN 19.8672 NaN NaN
2018-12-26 75.9530 258.143600 NaN NaN ... NaN 19.7980 NaN NaN
2018-12-27 75.5552 258.127199 NaN NaN ... NaN 19.7238 NaN NaN
2018-12-28 75.1382 257.878799 NaN NaN ... NaN 19.6440 NaN NaN
2018-12-31 74.7716 257.683199 NaN NaN ... NaN 19.5600 NaN NaN
[5 rows x 1830 columns]
编辑2:
根据请求,这里是data.head()。to_dict():
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'44792': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85753': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20220': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12044': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20239': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28433': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12052': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12060': {Timestamp('1925-12-31 00:00:00'): 326.0,
Timestamp('1926-01-02 00:00:00'): 326.5,
Timestamp('1926-01-04 00:00:00'): 325.0,
Timestamp('1926-01-05 00:00:00'): 325.5,
Timestamp('1926-01-06 00:00:00'): 326.25},
'12062': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85792': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12067': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77605': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77606': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20263': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12073': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12076': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12079': {Timestamp('1925-12-31 00:00:00'): 117.5,
Timestamp('1926-01-02 00:00:00'): 124.25,
Timestamp('1926-01-04 00:00:00'): 127.125,
Timestamp('1926-01-05 00:00:00'): 123.75,
Timestamp('1926-01-06 00:00:00'): 124.5},
'61241': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12095': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28484': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'53065': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20298': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77644': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28505': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'53081': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77659': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12124': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77661': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28513': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'61284': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77668': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12140': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85869': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20343': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28548': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77702': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12167': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85908': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12183': {Timestamp('1925-12-31 00:00:00'): 78.5,
Timestamp('1926-01-02 00:00:00'): 78.0,
Timestamp('1926-01-04 00:00:00'): 77.5,
Timestamp('1926-01-05 00:00:00'): 76.875,
Timestamp('1926-01-06 00:00:00'): 76.5},
'44951': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85913': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85914': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12191': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20386': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77730': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28580': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85926': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20394': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'69550': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12212': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20407': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12220': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20415': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77768': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85963': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20431': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'45014': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'61399': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'69607': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85991': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'53225': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20474': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20482': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86021': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'45065': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12298': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'69649': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12308': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20503': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'45081': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86041': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12319': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20511': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12343': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12345': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20554': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12369': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20562': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86102': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20570': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86111': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12394': {Timestamp('1925-12-31 00:00:00'): 123.5,
Timestamp('1926-01-02 00:00:00'): 124.0,
Timestamp('1926-01-04 00:00:00'): 123.25,
Timestamp('1926-01-05 00:00:00'): 123.5,
Timestamp('1926-01-06 00:00:00'): 122.75},
'36978': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86136': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28804': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86158': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12431': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'61583': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20626': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77976': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'53401': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86176': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12449': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'69796': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12456': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'45225': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12458': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20650': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28847': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
...}
不幸的是,我在这篇文章中没有足够的空间,但是MA_a_csv.head()。to_dict()产生的结果与上面相同,除了所有NaN而不是一个数据点。
答案 0 :(得分:1)
我根据您提供的示例制作了自己的示例数据生成器。我认为它适合您所拥有的,但如果不合适,请告诉我。如果数据匹配,请不要担心我的制作方式。
rows = 6
cols = 5
np.random.seed(0)
data = pd.DataFrame(np.random.rand(rows, cols) * 100,
index=pd.DatetimeIndex(freq='d', start='1928-12-31', periods=rows))
nan_cols = len(data.columns) // 2
random_indices = zip(pd.Series(data.index.values[:-rows // 2])
.sample(nan_cols, random_state=1, replace=True),
pd.Series(data.columns).sample(nan_cols, random_state=2))
for row, col in random_indices:
data.loc[:row, col] = np.nan
MA_a_csv = data * (1 + (np.random.rand(rows, cols) / 50
* np.random.choice([-1, 1], size=(rows, cols))))
所以data
看起来像
0 1 2 3 4
1928-12-31 54.881350 71.518937 NaN 54.488318 NaN
1929-01-01 64.589411 43.758721 NaN 96.366276 38.344152
1929-01-02 79.172504 52.889492 56.804456 92.559664 7.103606
1929-01-03 8.712930 2.021840 83.261985 77.815675 87.001215
1929-01-04 97.861834 79.915856 46.147936 78.052918 11.827443
1929-01-05 63.992102 14.335329 94.466892 52.184832 41.466194
MA_a_csv
看起来像
0 1 2 3 4
1928-12-31 55.171734 72.626384 NaN 55.107778 NaN
1929-01-01 63.791557 44.294412 NaN 98.185186 38.867028
1929-01-02 78.603241 53.351780 57.597027 92.448175 7.008877
1929-01-03 8.829794 2.013333 83.047291 77.324770 86.368349
1929-01-04 98.977844 80.616881 45.235708 77.893620 11.876852
1929-01-05 63.785651 14.522579 94.945445 52.671519 41.668902
我通过类似于您的gen_a
的程序来运行它,然后制作了一个矢量化版本,得到了相同的答案:
logs = np.log(MA_a_csv / data)
ans = ((j > logs) & (logs > -j)).replace({True: 1, False: 0})
ans
是
0 1 2 3 4
1928-12-31 1 0 0 0 0
1929-01-01 0 0 0 0 0
1929-01-02 1 1 0 1 0
1929-01-03 0 1 1 1 1
1929-01-04 0 1 0 1 1
1929-01-05 1 0 1 1 1
np.log
可以一次在整个数组上进行操作,而pandas可能也想将矢量进行大于矢量的比较。 &
是按位排列的,因此它只是检查每个位置的两个条件都成立。
这比我的gen_a
版本(没有try / except或print语句)快约180倍,因此对于您的代码来说,这应该是一个更大的改进。
您也不需要.replace({True: 1, False: 0})
部分-在Python中1 == True
和0 == False
是True,因此您应该可以互换使用它们。
让我知道您是否对此有任何疑问。为了进一步阅读,我建议汤姆·奥格斯普格(Tom Augspurger)的《现代熊猫》文章-Fast Pandas部分特别适用。
答案 1 :(得分:0)
将两个简短的评论组合成一个答案。
1)声明
j > math.log(
MA_a_csv.loc[date, security]/
data.loc[date, security]) > -j
通过执行abs
可以稍微简化,例如j > abs(...)
,并且可能会通过分别计算一次日志并利用log(a/b) == log(a) - log(b)
这一事实来大大加速。
即使只对一个单元格进行一次计算,您也可以计算并写回,以加快重新运行的速度。
2)如果您的实际代码中包含这些打印语句,它们将占用总时间中相当大的一部分。
答案 2 :(得分:-1)
在读取csv时,也许使用where
参数。您需要四处寻找确定要使用的最佳大小,但是我听到一个很好的经验法则,将其设置为可用内存的一半。
chunksize
将结果写回到文件中时,您需要确保附加参数集:
df = pd.read_csv("your.csv", chucksize=memory/2)
每次运行代码时都删除文件,或者确保在写入模式下(默认)完成对df.to_csv("yourresults.csv", mode='a')
的第一次调用。
我会尝试的其他选项:
1)使用诸如AWS EC2之类的云资源并购买一台高规格的高内存机器,将您的数据和代码转移到该机器上,并使其运行您的代码。它应该快很多。
2)我会考虑使用Pyspark之类的东西将负载分配到多台计算机上,但是如果还不熟悉,可能需要一些时间才能加快速度。
祝你好运!