Matplotlib对我来说非常困惑。我有pd.DataFrame
个列x
,y
和cluster
。我希望在x-y图上绘制这些数据,其中每个聚类都有不同的颜色和对哪个聚类的注释。
我能够分开做这些。要绘制不同颜色的数据:
for c in np.unique(data['cluster'].tolist()):
df = data[data['c'].isin([c])]
plt.plot(df['x'].tolist(),df['y'].tolist(),'o')
plt.show()
这会产生:
注释:
fig, ax = plt.subplots()
x = df['x'].tolist()
y = df['y'].tolist()
ax.scatter(x, y)
for i, txt in enumerate(data['cluster'].tolist()):
ax.annotate(txt, (x[i],y[i]))
plt.show()
这会产生:
我如何将两者结合起来?我不明白如何将figure
/ axes
/ plot
API混合在一起..
示例数据:
pd.DataFrame({'c': ['News', 'Hobbies & Interests', 'Arts & Entertainment', 'Internal Use', 'Business', 'Internal Use', 'Internal Use', 'Ad Impression Fraud', 'Arts & Entertainment', 'Adult Content', 'Arts & Entertainment', 'Internal Use', 'Internal Use', 'Reference', 'News', 'Shopping', 'Food & Drink', 'Internal Use', 'Internal Use', 'Reference'],
'x': [-95.44078826904297, 127.71454620361328, -491.93121337890625, 184.5579071044922, -191.46273803710938, 95.22545623779297, 272.2229919433594, -67.099365234375, -317.60797119140625, -175.90196228027344, -491.93121337890625, 214.3858642578125, 184.5579071044922, 346.4012756347656, -151.8809051513672, 431.6130676269531, -299.4017028808594, 184.5579071044922, 184.5579071044922, 241.29026794433594],
'y': [-40.87070846557617, 245.00514221191406, 43.07831954956055, -458.2991638183594, 270.4497985839844, -453.2981262207031, -439.6551513671875, -206.3104248046875, 205.25787353515625, -58.520164489746094, 43.07831954956055, -182.91664123535156, -458.2991638183594, 19.559282302856445, -281.3316650390625, 103.6922378540039, 280.2445373535156, -458.2991638183594, -458.2991638183594, -113.96920776367188]})
答案 0 :(得分:2)
出于舒适的原因,我将使用df.plot.scatter
语法,但应该(几乎)与ax.scatter相同。
好的,所以使用您的示例数据,您可以specify a cmap like described in the docs:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'c': ['News', 'Hobbies & Interests', 'Arts & Entertainment', 'Internal Use', 'Business', 'Internal Use', 'Internal Use', 'Ad Impression Fraud', 'Arts & Entertainment', 'Adult Content', 'Arts & Entertainment', 'Internal Use', 'Internal Use', 'Reference', 'News', 'Shopping', 'Food & Drink', 'Internal Use', 'Internal Use', 'Reference'],
'x': [-95.44078826904297, 127.71454620361328, -491.93121337890625, 184.5579071044922, -191.46273803710938, 95.22545623779297, 272.2229919433594, -67.099365234375, -317.60797119140625, -175.90196228027344, -491.93121337890625, 214.3858642578125, 184.5579071044922, 346.4012756347656, -151.8809051513672, 431.6130676269531, -299.4017028808594, 184.5579071044922, 184.5579071044922, 241.29026794433594],
'y': [-40.87070846557617, 245.00514221191406, 43.07831954956055, -458.2991638183594, 270.4497985839844, -453.2981262207031, -439.6551513671875, -206.3104248046875, 205.25787353515625, -58.520164489746094, 43.07831954956055, -182.91664123535156, -458.2991638183594, 19.559282302856445, -281.3316650390625, 103.6922378540039, 280.2445373535156, -458.2991638183594, -458.2991638183594, -113.96920776367188]})
df['col'] = df.c.astype('category').cat.codes
cmap = plt.cm.get_cmap('jet', df.c.nunique())
ax = df.plot.scatter(
x='x',y='y', c='col',
cmap=cmap
)
plt.show()
此处get_cmap
采用cmap名称(您可以在this example page上找到各种地图的名称)和
一个整数,给出查找表中所需的条目数
如果要添加注释并禁止使用颜色栏,请使用:
ax = df.plot.scatter(
x='x',y='y', c='col',
cmap=cmap, colorbar=False
)
for i, txt in enumerate(df['c'].tolist()):
ax.annotate(txt, (df.x[i], df.y[i]))
plt.show()
提示:如果{太小',请使用plt.scatter(x,y,s=None, c=None, **kwds)
中的“s”参数更改大小。
答案 1 :(得分:0)
令人惊讶的是,结合这两种方法也解决了它:
fig, ax = plt.subplots()
fig.set_size_inches(20,20)
x = df['x'].tolist()
y = df['y'].tolist()
ax.scatter(x, y)
for i, txt in enumerate(data['c'].tolist()):
ax.annotate(txt, (x[i],y[i]))
for c in np.unique(data['c'].tolist()):
df = tsne_df[data['c'].isin([c])]
plt.plot(data['x'].tolist(),data['y'].tolist(),'o')
plt.show()