我有一个数据集:
a b c d
10-Apr-86 Jimmy 1 Silly.doc
11-Apr-86 Minnie 2 Lala.doc
12-Apr-86 Jimmy 3 Goofy.doc
13-Apr-86 Minnie 4 Hilarious.doc
14-Apr-86 Jimmy 5 Joyous.doc
15-Apr-86 Eliot 6 Crackingup.doc
16-Apr-86 Jimmy 7 Funny.doc
17-Apr-86 Eliot 8 Happy.doc
18-Apr-86 Minnie 9 Mirthful.doc
在python 2.7.12中使用以下代码..
df = (pd.read_csv('python.csv'))
df_wanted = pd.pivot_table(
df,
index='a',
columns='b',
values='c')
df_wanted.index = pd.to_datetime(df_wanted.index)
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.scatter(df_wanted.index, df_wanted['Jimmy'], s=50, c='b', marker="s")
ax1.scatter(df_wanted.index,df_wanted['Minnie'], s=50, c='r', marker="o")
ax1.scatter(df_wanted.index,df_wanted['Eliot'], s=50, c='g', marker="8")
plt.legend(loc='upper left');
for k, v in df.set_index('a').iterrows():
plt.text(k, v['c'], v['d'])
plt.show()
..我可以在matplotlib中创建以下可视化:
问题:这只是一个玩具数据集。当我将此代码应用于具有超过3000个点的真实数据集时,所有数据标签在黑色难以辨认的块中混合在一起。
我希望通过使用代码here来避免此问题,以便在单击时显示数据标签。
我遇到的问题是上述代码的这一部分,
x=[1,2,3,4,5]
y=[6,7,8,9,10]
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
scat = ax.scatter(x, y)
DataCursor(scat, x, y)
plt.show()
显然,我需要将“x”和“y”替换为我的数据透视表列,但我不知道如何使scat = ax.scatter(x, y)
或DataCursor(scat, x, y)
使用我的数据。
我尝试了以下
scat = ax1.scatter(df_wanted.index, df_wanted['Minnie'], s=50, c='b', marker="s")
scat1 = ax1.scatter(df_wanted.index,df_wanted['Jimmy'], s=50, c='r', marker="o")
scat2 = ax1.scatter(df_wanted.index,df_wanted['Eliot'], s=50, c='g', marker="8")
DataCursor(scat,df_wanted.index,df_wanted['Minnie'])
DataCursor(scat1,df_wanted.index,df_wanted['Jimmy'])
DataCursor(scat2,df_wanted.index,df_wanted['Eliot'])
plt.show()
但我收到此错误TypeError: Invalid Type Promotion
更新:我使用here中的代码在控制台中获取文档名称:
from matplotlib.pyplot import figure, show
import numpy as npy
from numpy.random import rand
import pandas as pd
df = (pd.read_csv('python.csv'))
df_wanted = pd.pivot_table(
df,
index='a',
columns='b',
values='c')
df_wanted.index = pd.to_datetime(df_wanted.index)
if 1: # picking on a scatter plot (matplotlib.collections.RegularPolyCollection)
c = 'r'
c1 = 'b'
c2 = 'g'
s = 85
y = df_wanted['Minnie']
z = df_wanted['Jimmy']
f = df_wanted['Eliot']
x = df_wanted.index
def onpick3(event):
ind = event.ind
print npy.take(df['d'], ind)
fig = figure()
ax1 = fig.add_subplot(111)
col = ax1.scatter(x, y, s, c, picker=True)
ax2 = fig.add_subplot(111)
col = ax1.scatter(x, z, s, c1, picker=True)
ax3 = fig.add_subplot(111)
col = ax1.scatter(x, f, s, c2, picker=True)
plt.legend(loc='upper left')
#fig.savefig('pscoll.eps')
fig.canvas.mpl_connect('pick_event', onpick3)
show()
现在的问题是返回的文档名称不准确。我认为问题在于ind编号是针对每个单独的系列。我需要一种方法来组合所有系列,并为其总数分配一个ind数。
答案 0 :(得分:0)
我找到了解决方案。我意识到我想要遵循这个例子(Matplotlib scatterplot; colour as a function of a third variable),但需要首先制作一个x值列表和一个y值列表,而不是每个系列的x和y值的单独列表。
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure, show
import numpy as npy
from numpy.random import rand
import pandas as pd
df = (pd.read_csv('python.csv')) #upload dataset
df['a'] = pd.to_datetime(df['a']) #convert date column to useable format
x = list(df['a'].values.flatten()) #get dataframe column data in list format
y= list(df['c'].values.flatten()) #get dataframe column data in list format
var_names = list(df['b'].values.flatten()) #get dataframe column data in list format
var_names1 = list(set(var_names)) #get unique values from column b (names)
d = {var_names1[n]:n for n in range(len(var_names1))} #generate dictionary that assigns number to each unique name in col B
namesAsNumbers = [d[z] for z in var_names] #replace names with numbers in column B
c= namesAsNumbers
if 1: # picking on a scatter plot (matplotlib.collections.RegularPolyCollection) # user picks point on scatter
def onpick3(event):
ind = event.ind
print npy.take(df['d'], ind) #print the document name associated with the point that's been picked
fig = figure()
ax1 = fig.add_subplot(111)
col = ax1.scatter(x, y, s= 100, c=c, picker=True)
#fig.savefig('pscoll.eps')
fig.canvas.mpl_connect('pick_event', onpick3)
plt.legend()
show()
我只有问题:似乎无法显示传奇。