我试图模仿在这个问题中发现的情节: https://github.com/SupercondActor/platform-app-angular
我已经多次使用那里找到的代码了,没有问题。但是在将其应用于一些新数据时,我发现树状图的链接中存在一个倒置的模式。 plotting results of hierarchical clustering ontop of a matrix of data in python
正如您将在深蓝色链接的上部看到的那样,这些簇从分支点上方开始。在某些情况下,这可能是典型的行为,但对于树状图以比子分支更低的多样性进行分支似乎是违反直觉的。这里要比较的是具有一系列功能的各种蛋白质丰度。减去这些功能之间的相关性实际上就是输入到scipy链接函数中的内容。整个图的代码如下:
#Making correlation matrix with dendrograms
corr = 1 - df_log2.corr()
corr_condensed = hc.distance.squareform(corr) # convert to condensed
# Compute and plot first dendrogram.
fig = pylab.figure(figsize=(30,30))
ax1 = fig.add_axes([-0.1,0.1,0.35,0.6])
Y = hc.linkage(corr_condensed, method='centroid')
Z1 = hc.dendrogram(Y, orientation='left')
ax1.set_xticks([])
ax1.set_yticks([])
# Compute and plot second dendrogram.
ax2 = fig.add_axes([0.3,0.75,0.6,0.35])
Y = hc.linkage(corr_condensed, method='centroid')
Z2 = hc.dendrogram(Y)
ax2.set_xticks([])
ax2.set_yticks([])
# Plot distance matrix.
axmatrix = fig.add_axes([0.3,0.1,0.6,0.6])
idx = list(Z1['leaves'])
featdict = {0: 'IndA 3K', 1: 'IndA 5.4K', 2: 'IndA 12.2K', 3: 'IndA 24K', 4: 'IndA 78.4K', 5: 'IndA 110K', 6: 'IndA 195.5K',
7: 'IndB 3K', 8: 'IndB 5.4K', 9: 'IndB 12.2K', 10: 'IndB 24K', 11: 'IndB 78.4K', 12: 'IndB 110K', 13 :'IndB 195.5K',
14: 'IndC 3K', 15: 'IndC 5.4K', 16: 'IndC 12.2K', 17: 'IndC 24K', 18: 'IndC 78.4K', 19: 'IndC 110K', 20: 'IndC 195.5K',
21: 'UnA 3K', 22: 'UnA 5.4K', 23: 'UnA 12.2K', 24: 'UnA 24K', 25: 'UnA 78.4K', 26: 'UnA 110K', 27: 'UnA 195.5K',
28: 'UnB 3K', 29: 'UnB 5.4K', 30: 'UnB 12.2K', 31: 'UnB 24K', 32: 'UnB 78.4K', 33: 'UnB 110K', 34: 'UnB 195.5K',
35: 'UnC 3K', 36: 'UnC 5.4K', 37: 'UnC 12.2K', 38: 'UnC 24K', 39: 'UnC 78.4K', 40: 'UnC 110K', 41: 'UnC 195.5K'}
corr.index = list(range(0,42))
corr.columns = list(range(0,42))
#corr = corr[idx1][:]
#corr = corr[:][idx2]
carray = corr.values
corrind = carray[idx,:][:,idx]
corrflip = 1 - corrind
im = axmatrix.matshow(corrflip, aspect='auto', origin='lower', cmap=pylab.cm.YlGnBu)
idx = [featdict[x] for x in idx]
axmatrix.set_xticklabels(['']+idx, rotation = 90)
axmatrix.set_yticklabels(['']+idx)
axmatrix.xaxis.set_major_locator(ticker.MultipleLocator(1))
axmatrix.yaxis.set_major_locator(ticker.MultipleLocator(1))
# Plot colorbar.
axcolor = fig.add_axes([0.91,0.1,0.02,0.6])
pylab.colorbar(im, cax=axcolor)
fig.suptitle('Pearson Correlation \nMatrix with Centroid \nLinkage Dendrogram', x=0.13, y=0.9, fontsize=40)
fig.savefig('PearsonCorr_matrixwithdendro_normlog2.png', dpi=500, bbox_inches='tight')
答案 0 :(得分:0)
在偶然地搜索了另一个问题之后,我偶然发现“质心”链接的使用实际上会导致某些数据树状图中的这些反转或逆转。他们显然是不可避免的。在StackExchange上讨论此问题有一个不错的答案:https://stats.stackexchange.com/questions/26769/cluster-analysis-in-r-produces-reversals-on-dendrogram