是否可以为matplotlib矢量化注释?

时间:2017-09-20 13:45:20

标签: performance python-3.x pandas matplotlib vectorization

作为大型QC基准测试的一部分,我使用PdfPages后端在单个PDF中创建了大量(大约100K)的散点图。 (请参阅下面的代码)

我遇到的问题是绘图需要花费太多时间,请参阅自定义分析/调试工作的输出:

Checkpoint1: Predictions done in 1.110076904296875 millis
Checkpoint2: df created and correlations calculated in 3.108978271484375 millis
Checkpoint3: plotting and accumulating done in 231.31990432739258 millis
Cycle completed in 0.23553895950317383 secs
----------------------
Checkpoint1: Predictions done in 3.718852996826172 millis
Checkpoint2: df created and correlations calculated in 2.353191375732422 millis
Checkpoint3: plotting and accumulating done in 155.93385696411133 millis
Cycle completed in 0.16200590133666992 secs
----------------------
Checkpoint1: Predictions done in 2.920866012573242 millis
Checkpoint2: df created and correlations calculated in 1.995086669921875 millis
Checkpoint3: plotting and accumulating done in 161.8819236755371 millis
Cycle completed in 0.16679787635803223 secs

如果我对点进行注释,那么绘图的数字会增加2-3倍,这是用例所必需的。正如您在下面看到的,我已经尝试了itertuples()apply(),切换到应用并没有给我的时间带来重大变化。

def annotate(row, ax):
    ax.annotate(row.name, (row.exp, row.model),
                    xytext=(10, 20), textcoords='offset points',
                    arrowprops=dict(arrowstyle="-", connectionstyle="arc,angleA=180,armA=10"),
                    family='sans-serif', fontsize=8, color='darkslategrey')


def plot2File(df, file, seq, z, p, s):
    """ Plot predictions vs experimental """
    plttitle = f"Correlations for {seq}+{z} \n pearson={p} \n spearman={s}"
    ax = df.plot(x='exp', y='model', kind='scatter', title=plttitle, s=40)
    df.apply(annotate, ax=ax, axis=1)
#     for row in df.itertuples():
#         ax.annotate(row.Index, (row.exp, row.model),
#                     xytext=(10, 20), textcoords='offset points',
#                     arrowprops=dict(arrowstyle="-", connectionstyle="arc,angleA=180,armA=10"),
#                     family='sans-serif', fontsize=8, color='darkslategrey')

    plt.savefig(file, bbox_inches='tight', format='pdf')
    plt.close()

鉴于关于iterrows()的问题的nice explanation by Jeff,我想知道是否可以对注释过程进行矢量化?或者我应该完全放弃使用数据框?

0 个答案:

没有答案