将颜色条标签添加为散点图上的文本

时间:2020-01-21 15:21:59

标签: python matplotlib plot scatter

我有一个散布图,使用:

x = list(auto['umap1'])
y = list(auto['umap2'])


final_df2 = pd.DataFrame(list(zip(x,y,communities)), columns =['x', 'y', 'cluster'])
no_clusters = max(communities)
cluster_list = list(range (min(communities), no_clusters+1))
fig2, ax = plt.subplots(figsize = (20,15))
plt.scatter(x,y, c=final_df2['cluster'], cmap=plt.cm.get_cmap('hsv', max(cluster_list)), s = 0.5)
plt.title('Phenograph on UMAP - All Markers (auto)', fontsize=15)
plt.xlabel('umap_1', fontsize=15)
plt.ylabel('umap_2', fontsize=15)
plt.colorbar(extend='both',ticks = range(max(cluster_list)))
plt.show()

我想知道如何将颜色条标签(数字从1-31)添加到图形上的实际簇(作为文本),每个簇对应于此。这是因为很难从颜色中分辨出来,因为它们会变回红色。

我尝试过:

n = list(final_df2['cluster'])
for i, txt in enumerate(n):
    ax.annotate(txt, (y[i], x[i]))

但这没有给我任何运气。 enter image description here

1 个答案:

答案 0 :(得分:2)

您的注释代码正在为每个点编写一个注释。这只是数字的海洋而结束。

以某种方式,您应该为每个群集找到一种中心,例如,通过平均属于同一群集的所有点。

然后,您使用中心的坐标来定位文本。您可以为其提供背景,使其更易于阅读。

由于我没有您的数据,因此下面的代码模拟了中心周围已经存在的一些点。

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np

# calculate some random points to serve as cluster centers; run a few steps of a relaxing algorithm to separate them a bit
def random_distibuted_centers():
    cx = np.random.uniform(-10, 10, MAX_CLUST + 1)
    cy = np.random.uniform(-10, 10, MAX_CLUST + 1)
    for _ in range(10):
        for i in range(1, MAX_CLUST + 1):
            for j in range(1, MAX_CLUST + 1):
                if i != j:
                    dist = np.linalg.norm([cx[i] - cx[j], cy[i] - cy[j]])
                    if dist < 4:
                        cx[i] += 0.4 * (cx[i] - cx[j]) / dist
                        cy[i] += 0.4 * (cy[i] - cy[j]) / dist
    return cx, cy

N = 1000
MAX_CLUST = 31
cx, cy = random_distibuted_centers()

# for demonstration purposes, just generate some random points around the centers
x =  np.concatenate( [np.random.normal(cx[i], 2, N) for i in range(1,MAX_CLUST+1)])
y =  np.concatenate( [np.random.normal(cy[i], 2, N) for i in range(1,MAX_CLUST+1)])
communities = np.repeat(range(1,MAX_CLUST+1), N)

final_df2 = pd.DataFrame({'x':x, 'y':y, 'cluster': communities})
no_clusters = max(communities)
cluster_list = list(range (min(communities), no_clusters+1))
fig2, ax = plt.subplots(figsize = (20,15))
plt.scatter(x,y, c=final_df2['cluster'], cmap=plt.cm.get_cmap('hsv', max(cluster_list)), s=0.5)
plt.title('Phenograph on UMAP - All Markers (auto)', fontsize=15)
plt.xlabel('umap_1', fontsize=15)
plt.ylabel('umap_2', fontsize=15)
plt.colorbar(extend='both',ticks = cluster_list)

bbox_props = dict(boxstyle="circle,pad=0.3", fc="white", ec="black", lw=2, alpha=0.9)
for i in range(1,MAX_CLUST+1):
    ax.annotate(i, xy=(cx[i], cy[i]), ha='center', va='center', bbox=bbox_props)
plt.show()

example plot

相关问题