我有一个约有400个样本和约20个区域的数据集。在此数据集中,“ 1”表示折断区域(该区域中dna的断裂),“ 0”表示完整区域。我想将此数据与seaborn.clustermap聚类。由于断开的区域比完整的区域具有更多的信息,因此我最初会选择Jaccard距离。但是,我有很多空行(根本没有中断)。这将导致严重的崩溃(0/0-> nan)。为了解决这个问题,我尝试设置自己的链接矩阵,但是文档非常稀疏,无法弄清楚。有任何想法吗?
import pandas as pd
import seaborn as sns; sns.set(color_codes=True)
import matplotlib.pyplot as plt
import numpy as np
import scipy.cluster.hierarchy
# my dataset is called 'df'
print(df.shape)
## = (464, 23) ##
Y = scipy.spatial.distance.pdist(df, metric='jaccard')
Y = np.nan_to_num(Y) # distance matrix
linkage = scipy.cluster.hierarchy.linkage(Y, method='average') #
linkage matrix
print(len(Y))
## 107416 . ##
print(len(linkage))
## 463 ##
cmap = sns.cubehelix_palette(as_cmap=True, rot=-.3, light=1)
sns.clustermap(df, cmap=cmap, row_linkage=linkage, col_linkage=linkage)
plt.show()
这将导致以下错误消息:
Traceback (most recent call last):
File "/Users/nienke/Documents/stage/scripts/structuralvariants/realcluster.py", line 32, in <module>
sns.clustermap(df, cmap=cmap, row_linkage=linkage, col_linkage=linkage)
File "/Users/nienke/anaconda3/lib/python3.6/site-packages/seaborn/matrix.py", line 1301, in clustermap
**kwargs)
File "/Users/nienke/anaconda3/lib/python3.6/site-packages/seaborn/matrix.py", line 1142, in plot
self.plot_matrix(colorbar_kws, xind, yind, **kws)
File "/Users/nienke/anaconda3/lib/python3.6/site-packages/seaborn/matrix.py", line 1100, in plot_matrix
self.data2d = self.data2d.iloc[yind, xind]
File "/Users/nienke/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1367, in __getitem__
return self._getitem_tuple(key)
File "/Users/nienke/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1737, in _getitem_tuple
self._has_valid_tuple(tup)
File "/Users/nienke/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 204, in _has_valid_tuple
if not self._has_valid_type(k, i):
File "/Users/nienke/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1674, in _has_valid_type
return self._is_valid_list_like(key, axis)
File "/Users/nienke/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1731, in _is_valid_list_like
raise IndexError("positional indexers are out-of-bounds")
IndexError: positional indexers are out-of-bounds
非常感谢