使此熊猫代码尽可能地精简和快速? [遍历大型DataFrames和设置]

时间:2019-02-27 23:48:13

标签: python pandas dataframe finance

对于上下文,我的主数据集是一个24541行x 1830列的DataFrame,其中充满了NaN或浮点数(股价)。我正在处理此DataFrame 11次,每次都在具有相同索引和列的转换DataFrame中设置值。这两个DataFrame的示例如下:

data = pd.DataFrame.from_csv(filepath)
data = pd.DataFrame(data=data, dtype=np.float64)

#dataset of daily prices
data.head()

Out[14]: 
            49154  65541  32791  65568  ...  24563  81910  24571  90110
DATE                                    ...                            
1925-12-31    NaN    NaN    NaN    NaN  ...    NaN    NaN    NaN    NaN
1926-01-02    NaN    NaN    NaN    NaN  ...    NaN    NaN    NaN    NaN
1926-01-04    NaN    NaN    NaN    NaN  ...    NaN    NaN    NaN    NaN
1926-01-05    NaN    NaN    NaN    NaN  ...    NaN    NaN    NaN    NaN
1926-01-06    NaN    NaN    NaN    NaN  ...    NaN    NaN    NaN    NaN

[5 rows x 1830 columns]

MA_a_frame = pd.DataFrame(
        data=0,
        index=data.index, 
        columns=data.columns)

#bool DataFrame
MA_a_frame.head()

Out[15]: 
            49154  65541  32791  65568  ...  24563  81910  24571  90110
DATE                                    ...                            
1925-12-31      0      0      0      0  ...      0      0      0      0
1926-01-02      0      0      0      0  ...      0      0      0      0
1926-01-04      0      0      0      0  ...      0      0      0      0
1926-01-05      0      0      0      0  ...      0      0      0      0
1926-01-06      0      0      0      0  ...      0      0      0      0

[5 rows x 1830 columns]

如果满足数据框“数据”中的特定条件,则将MA_a_frame(和其他10个相同的数据框)中的值设置为1。即,如果“数据”中的价格在上一个函数中生成的完全不同 DataFrame中的计算值的1%以内(参数为“ j”)。因此,总计而言,每个迭代最多可处理3个大型DataFrame。

就我的迭代器而言,我仅使用data.columns和data.index创建两个单独的列表(“日期”和“安全性”)。因此,我实质上是在间接地遍历数据的索引和列。事不宜迟,这是我的程序中总共运行11次的代码基础(这是我要加速的部分!):

def gen_a():

    for date in dates:

        for security in securities: 

            try: 

                if type(data.loc[date, security]) is not float:

                    pass
                    #lots of the data is NaN, so skip these altogether

                elif j > math.log(
                        MA_a_csv.loc[date, security]/
                        data.loc[date, security]) > -j:

                    MA_dict['a'].loc[date, security] = 1

                print(f'Passed {date}, {security}')

            except: 

                print(f'Failed {date}, {security}')

现在,问题在于此代码的一个周期需要大约8个小时。因此,我每次运行将近90个小时。我有一份学术论文作为毕业要求,而截止日期真的让这些数字吓到我了!假设我的输出是完美的,一切都会好起来的,但是如果有人提出降低速度的建议,我将永远感激不已。否则,我可能不得不缩减数据范围,从而降低了统计分析的能力。

P.S。我正在使用Intel i7 3970X在Windows 10上通过Spyder运行它。我没有其他任何计算能力。我考虑过GPU加速,但是我的GPU是GTX 670,它不是Pascal,因此与CuDF不兼容。

编辑:

这是数据DataFrame的底部五行:

s.head()
Out[16]: 
            49154      65541  32791  65568  ...  24563  81910  24571  90110
DATE                                        ...                            
2018-12-24  61.55  232.70000    NaN    NaN  ...    NaN  15.71    NaN    NaN
2018-12-26  65.11  244.59000    NaN    NaN  ...    NaN  16.48    NaN    NaN
2018-12-27  64.71  252.17999    NaN    NaN  ...    NaN  16.71    NaN    NaN
2018-12-28  64.96  249.64999    NaN    NaN  ...    NaN  16.55    NaN    NaN
2018-12-31  66.09  254.50000    NaN    NaN  ...    NaN  16.74    NaN    NaN

[5 rows x 1830 columns]

这是一个比较DataFrames的示例:

Out[23]: 
              49154       65541  32791  65568  ...  24563    81910  24571  90110
DATE                                           ...                              
2018-12-24  76.3430  258.376200    NaN    NaN  ...    NaN  19.8672    NaN    NaN
2018-12-26  75.9530  258.143600    NaN    NaN  ...    NaN  19.7980    NaN    NaN
2018-12-27  75.5552  258.127199    NaN    NaN  ...    NaN  19.7238    NaN    NaN
2018-12-28  75.1382  257.878799    NaN    NaN  ...    NaN  19.6440    NaN    NaN
2018-12-31  74.7716  257.683199    NaN    NaN  ...    NaN  19.5600    NaN    NaN

[5 rows x 1830 columns]

编辑2:

根据请求,这里是data.head()。to_dict():

  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '44792': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85753': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20220': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12044': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20239': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28433': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12052': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12060': {Timestamp('1925-12-31 00:00:00'): 326.0,
  Timestamp('1926-01-02 00:00:00'): 326.5,
  Timestamp('1926-01-04 00:00:00'): 325.0,
  Timestamp('1926-01-05 00:00:00'): 325.5,
  Timestamp('1926-01-06 00:00:00'): 326.25},
 '12062': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85792': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12067': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77605': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77606': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20263': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12073': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12076': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12079': {Timestamp('1925-12-31 00:00:00'): 117.5,
  Timestamp('1926-01-02 00:00:00'): 124.25,
  Timestamp('1926-01-04 00:00:00'): 127.125,
  Timestamp('1926-01-05 00:00:00'): 123.75,
  Timestamp('1926-01-06 00:00:00'): 124.5},
 '61241': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12095': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28484': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '53065': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20298': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77644': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28505': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '53081': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77659': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12124': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77661': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28513': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '61284': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77668': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12140': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85869': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20343': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28548': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77702': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12167': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85908': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12183': {Timestamp('1925-12-31 00:00:00'): 78.5,
  Timestamp('1926-01-02 00:00:00'): 78.0,
  Timestamp('1926-01-04 00:00:00'): 77.5,
  Timestamp('1926-01-05 00:00:00'): 76.875,
  Timestamp('1926-01-06 00:00:00'): 76.5},
 '44951': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85913': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85914': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12191': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20386': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77730': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28580': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85926': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20394': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '69550': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12212': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20407': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12220': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20415': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77768': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85963': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20431': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '45014': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '61399': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '69607': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85991': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '53225': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20474': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20482': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86021': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '45065': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12298': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '69649': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12308': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20503': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '45081': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86041': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12319': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20511': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12343': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12345': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20554': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12369': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20562': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86102': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20570': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86111': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12394': {Timestamp('1925-12-31 00:00:00'): 123.5,
  Timestamp('1926-01-02 00:00:00'): 124.0,
  Timestamp('1926-01-04 00:00:00'): 123.25,
  Timestamp('1926-01-05 00:00:00'): 123.5,
  Timestamp('1926-01-06 00:00:00'): 122.75},
 '36978': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86136': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28804': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86158': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12431': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '61583': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20626': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77976': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '53401': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86176': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12449': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '69796': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12456': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '45225': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12458': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20650': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28847': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 ...}

不幸的是,我在这篇文章中没有足够的空间,但是MA_a_csv.head()。to_dict()产生的结果与上面相同,除了所有NaN而不是一个数据点。

3 个答案:

答案 0 :(得分:1)

我根据您提供的示例制作了自己的示例数据生成器。我认为它适合您所拥有的,但如果不合适,请告诉我。如果数据匹配,请不要担心我的制作方式。

rows = 6
cols = 5
np.random.seed(0)
data = pd.DataFrame(np.random.rand(rows, cols) * 100, 
                  index=pd.DatetimeIndex(freq='d', start='1928-12-31', periods=rows))
nan_cols = len(data.columns) // 2
random_indices = zip(pd.Series(data.index.values[:-rows // 2])
                     .sample(nan_cols, random_state=1, replace=True), 
                     pd.Series(data.columns).sample(nan_cols, random_state=2))
for row, col in random_indices:
    data.loc[:row, col] = np.nan

MA_a_csv = data * (1 + (np.random.rand(rows, cols) / 50 
                        * np.random.choice([-1, 1], size=(rows, cols))))

所以data看起来像

                    0          1          2          3          4
1928-12-31  54.881350  71.518937        NaN  54.488318        NaN
1929-01-01  64.589411  43.758721        NaN  96.366276  38.344152
1929-01-02  79.172504  52.889492  56.804456  92.559664   7.103606
1929-01-03   8.712930   2.021840  83.261985  77.815675  87.001215
1929-01-04  97.861834  79.915856  46.147936  78.052918  11.827443
1929-01-05  63.992102  14.335329  94.466892  52.184832  41.466194

MA_a_csv看起来像

                    0          1          2          3          4
1928-12-31  55.171734  72.626384        NaN  55.107778        NaN
1929-01-01  63.791557  44.294412        NaN  98.185186  38.867028
1929-01-02  78.603241  53.351780  57.597027  92.448175   7.008877
1929-01-03   8.829794   2.013333  83.047291  77.324770  86.368349
1929-01-04  98.977844  80.616881  45.235708  77.893620  11.876852
1929-01-05  63.785651  14.522579  94.945445  52.671519  41.668902

我通过类似于您的gen_a的程序来运行它,然后制作了一个矢量化版本,得到了相同的答案:

logs = np.log(MA_a_csv / data)
ans = ((j > logs) & (logs > -j)).replace({True: 1, False: 0})

ans

            0  1  2  3  4
1928-12-31  1  0  0  0  0
1929-01-01  0  0  0  0  0
1929-01-02  1  1  0  1  0
1929-01-03  0  1  1  1  1
1929-01-04  0  1  0  1  1
1929-01-05  1  0  1  1  1

np.log可以一次在整个数组上进行操作,而pandas可能也想将矢量进行大于矢量的比较。 &是按位排列的,因此它只是检查每个位置的两个条件都成立。

这比我的gen_a版本(没有try / except或print语句)快约180倍,因此对于您的代码来说,这应该是一个更大的改进。

您也不需要.replace({True: 1, False: 0})部分-在Python中1 == True0 == False是True,因此您应该可以互换使用它们。

让我知道您是否对此有任何疑问。为了进一步阅读,我建议汤姆·奥格斯普格(Tom Augspurger)的《现代熊猫》文章-Fast Pandas部分特别适用。

答案 1 :(得分:0)

将两个简短的评论组合成一个答案。

1)声明

j > math.log(
   MA_a_csv.loc[date, security]/
   data.loc[date, security]) > -j
通过执行abs可以稍微简化

,例如j > abs(...)

,并且可能会通过分别计算一次日志并利用log(a/b) == log(a) - log(b)这一事实来大大加速。

即使只对一个单元格进行一次计算,您也可以计算并写回,以加快重新运行的速度。

2)如果您的实际代码中包含这些打印语句,它们将占用总时间中相当大的一部分。

答案 2 :(得分:-1)

在读取csv时,也许使用where参数。您需要四处寻找确定要使用的最佳大小,但是我听到一个很好的经验法则,将其设置为可用内存的一半。

chunksize

将结果写回到文件中时,您需要确保附加参数集:

df = pd.read_csv("your.csv", chucksize=memory/2)

每次运行代码时都删除文件,或者确保在写入模式下(默认)完成对df.to_csv("yourresults.csv", mode='a') 的第一次调用。

我会尝试的其他选项:

1)使用诸如AWS EC2之类的云资源并购买一台高规格的高内存机器,将您的数据和代码转移到该机器上,并使其运行您的代码。它应该快很多。

2)我会考虑使用Pyspark之类的东西将负载分配到多台计算机上,但是如果还不熟悉,可能需要一些时间才能加快速度。

祝你好运!