我正在使用scipy来聚类10k的化学数据。首先,我使用pdist()计算成对距离,然后使用ward()进行聚类。接下来,我结合了2个树状图和一个矩阵,它给出了聚类的热图可视化。对于我的树形图,我用数组标记了数据,这些颜色给了我群集叶子的彩色标签。当我使用数据子集进行测试时,它在我的Windows本地计算机上运行良好。
然而,当我想使用在Linux上运行的大学HPC计算实现所有10k的代码时,显然它会给出错误,这似乎与matplotlib AGG有关(如果使用的话,我应该使用它HPC)。
我的问题是,使用AGG时有没有人遇到同样的问题?并且,有什么方法可以保存我生成的集群(Z),以便我可以使用我的本地机器显示图而不是在HPC中显示它的问题吗?
感谢您在评论中的回复。调试批处理作业时的实际错误如下:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Missing separate debuginfo for /home/lip13lm/.conda/envs/rdkit-/lib/python3.5/site-packages/numpy/core/../../../../libiomp5.so
Program received signal SIGSEGV, Segmentation fault.
generate (this=<optimized out>, len=840, y=1836238, x=1563, span=0xe91b160) at extern/agg24-svn/include/agg_span_image_filter_rgba.h:68
68 extern/agg24-svn/include/agg_span_image_filter_rgba.h: No such file or directory.
#0 generate (this=<optimized out>, len=840, y=1836238, x=1563, span=0xe91b160) at extern/agg24-svn/include/agg_span_image_filter_rgba.h:68
#1 generate (len=840, y=629, x=0, span=<optimized out>, this=0x7fffffffba40) at extern/agg24-svn/include/agg_span_converter.h:45
在HPC中,我的环境使用Python 3.5.3和matplotlib 2.0.0。
当我尝试使用HPC在开始时使用相同数量的数据绘制简单的树形图时,它已成功运行。只有当我编辑代码以将两个树形图与矩阵集成到热图和颜色标签时才会出现错误。所以我假设数据量不是问题,因为我之前使用相同的后端测试了它。
这是我的代码:
import matplotlib
matplotlib.use('Agg')
X = dist.pdist(C,coef)
Y = squareform(X)
Z = ward(X)
print ("Starting dendogram heatmap plotting...")
plt.clf() # plt.clf() clears the entire current figure with all its axes,
# but leaves the window opened, such that it may be reused for other plots
fig=plt.figure(figsize=(14, 15))
# Compute and plot FIRST (1) dendrogram
# add_axes(left,bottom,width,height)
ax1=fig.add_axes([0.09,0.1,0.2,0.6])
Z1=dendrogram(
Z,
color_threshold=30,
orientation='left',
above_threshold_color="grey"
)
ax1.set_xticks([])
ax1.set_yticks([])
ax1.spines['right'].set_visible(False)
ax1.spines['left'].set_visible(False)
ax1.spines['top'].set_visible(False)
ax1.spines['bottom'].set_visible(False)
# Compute and plot SECOND (2) dendrogram
ax2=fig.add_axes([0.3,0.71,0.6,0.2])
plt.title('Ward Hierarchical Clustering Dendrogram MDDR\n\n [ size: %d bits - coefficient: %s ]\n' %(size,coef))
Z2=dendrogram(
Z,
color_threshold=30,
no_labels=True,
above_threshold_color="grey"
)
ax2.spines['right'].set_visible(False)
ax2.spines['left'].set_visible(False)
ax2.spines['top'].set_visible(False)
ax2.spines['bottom'].set_visible(False)
plt.ylabel('distance')
# Method 1: Assign bar color label below dendogram
color=[classes[k] for k in Z2['leaves']]
b=.1*Z[-2,2] # adjust the height of each bar
plt.bar(
np.arange(N)*10,
np.ones(N)*b,
bottom=-b,
width=10,
color=color,
edgecolor='none',
align='center'
)
plt.gca().set_ylim((-b,None))
# Plot distance matrix
axmatrix=fig.add_axes([0.3,0.1,0.6,0.6])
idx1=Z1['leaves']
idx2=Z2['leaves']
Y=Y[idx1,:]
Y=Y[:,idx2]
# http://matplotlib.org/examples/color/colormaps_reference.html
im=axmatrix.matshow(Y, aspect='auto', origin='lower', cmap=pylab.cm.gist_earth)
axmatrix.set_xticks([]) # disable matrix x-axis ticks & label
axmatrix.set_yticks([]) # disable matrix y-axis ticks & label
axmatrix.spines['right'].set_visible(False)
axmatrix.spines['left'].set_visible(False)
axmatrix.spines['top'].set_visible(False)
axmatrix.spines['bottom'].set_visible(False)
# Plot colorbar
axcolorbar=fig.add_axes([0.91,0.1,0.02,0.6])
cb=pylab.colorbar(im, cax=axcolorbar)
cb.outline.set_visible(False)
# Configure legend
# http://matplotlib.org/examples/color/named_colors.html
i_patch=mpatches.Patch(color='black', label='Inactives')
t_patch=mpatches.Patch(color='tan', label='Thrombin')
s_patch=mpatches.Patch(color='teal', label='SubP')
r_patch=mpatches.Patch(color='lightpink', label='Renin')
p_patch=mpatches.Patch(color='salmon', label='PKC')
h_patch=mpatches.Patch(color='olive', label='HIVP')
d_patch=mpatches.Patch(color='aquamarine', label='D2')
c_patch=mpatches.Patch(color='mediumslateblue', label='COX')
a_patch=mpatches.Patch(color='gold', label='AT1')
z_patch=mpatches.Patch(color='limegreen', label='5HT3')
y_patch=mpatches.Patch(color='orange', label='5HT1A')
x_patch=mpatches.Patch(color='orchid', label='5HT')
# Plot legend
# add_axes(left,bottom,width,height)
axlegend=fig.add_axes([0.04,0.71,0.2,0.2])
axlegend.axis('off')
axlegend.legend(handles=[x_patch,
y_patch,
z_patch,
a_patch,
c_patch,
d_patch,
h_patch,
p_patch,
r_patch,
s_patch,
t_patch,
i_patch,],
title='Activity Class\n')
print ("Dendogram heatmap plotting completed...")
# save dendogram
print ("Saving dendogram...")
# output = "C:\Users\Lucy\knime-workspace\Experiment File - Clustering\Output\New\cluster_%d_%s.png" %(size,coef)
output = "/home/lip13lm/cluster/output/cluster_%d_%s.png" %(size,coef)
fig.savefig(output, format='png')
print ("Dendogram saved")
提前致谢。