Question

我有MultiIndex个大熊猫Series，正在尝试将每个索引绘制在自己的子图中，但是运行非常缓慢。

要完成细分，我在MultiIndex的外部层次上使用一个for循环，并使用内部索引层次作为x坐标来绘制Series。

def plot_series( data ):
    # create 16 subplots, corresponding to the 16 outer index levels
    fig, axs = plt.subplots( 4, 4 )

    for oi in data.index.get_level_values( 'outer_index' ):
        # calculate subplot to use
        row = int( oi/ 4 )
        col = int( oi - row* 4 )

        ax = axs[ row, col ]
        data.xs( oi ).plot( use_index = True, ax = ax )

    plt.show()

每个外部索引级别都有1000个数据点，但是绘制需要几分钟才能完成。

有没有办法加快绘图速度？

数据

num_out = 16
num_in  = 1000

data = pd.Series( 
    data = np.random.rand( num_out* num_in ), 
    index = pd.MultiIndex.from_product( [ np.arange( num_out ), np.arange( num_in ) ], names = [ 'outer_index', 'inner_index' ] ) 
)

Answer 1

您可以使用data.index.get_level_values( 'outer_index' )和iterate through the grouped object来代替遍历data.groupby(level='outer_index')：

for name, group in grouped:
   #do stuff

这消除了使用data.xs( oi )创建的切片数据帧的瓶颈。

def plot_series(data):
   grouped = data.groupby(level='outer_index')

   fig, axs = plt.subplots( 4, 4 )
   for name, group in grouped:
      row = int( name/ 4 )
      col = int( name - row* 4 )
      ax = axs[ row, col ]
      group.plot( use_index = True, ax = ax )

      plt.show()



num_out = 16
num_in  = 1000

data = pd.Series( 
    data = np.random.rand( num_out* num_in ), 
    index = pd.MultiIndex.from_product( [ np.arange( num_out ), np.arange( num_in ) ], names = [ 'outer_index', 'inner_index' ] ) 
)

plot_series(data)

使用timeit可以看到这种方法快得多：

%timeit plot_series(data)
795 ms ± 252 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

慢的matplotlib绘图

1 个答案: