我有一个大的数据集,在分布中是对数的。我想做一个热图,所以我做了一个2D直方图并将其传递给了implot。因为数据是对数的,所以我将数据的日志传递给直方图。但是,当我制作绘图时,我希望恢复轴(即10 ^ hist bin值)和对数轴。如果我将轴设置为日志样式,则图像看起来全部倾斜。当我将数据传递给直方图时,数据已经“记录”了,因此我不希望图像受影响,只是轴。因此,在下面的示例中,我希望左侧的图像与右侧的轴。
我想我可以用假的叠加轴做到这一点,但如果有更好的方法,我不喜欢做那种事情......
import numpy as np
import matplotlib.pyplot as plt
x=10**np.random.random(10000)*5
y=10**np.random.random(10000)*5
samps, xedges, yedges = np.histogram2d(np.log10(y), np.log10(x), bins=50)
ax = plt.subplot(121)
plt.imshow(samps, extent=[0,5,0,5])
plt.xlabel('Log10 X')
plt.ylabel('Log10 Y')
ax = plt.subplot(122)
plt.imshow(samps, extent=[10**0,10**5,10**0,10**5])
plt.xlabel('X')
plt.ylabel('Y')
plt.xscale('log')
plt.yscale('log')
plt.show()
答案 0 :(得分:2)
您需要使用自定义格式化程序。这是matplotlib文档中的一个示例: https://matplotlib.org/examples/pylab_examples/custom_ticker1.html
我倾向于使用FuncFormatter
作为示例。主要技巧是你的函数需要接受参数x
和pos
。老实说,我不知道pos
的用途。也许甚至没有故意,但你可以使用FuncFormatter
作为装饰者,这就是我在下面所做的:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
@plt.FuncFormatter
def fake_log(x, pos):
'The two args are the value and tick position'
return r'$10^{%d}$' % (x)
x=10**np.random.random(10000)*5
y=10**np.random.random(10000)*5
samps, xedges, yedges = np.histogram2d(np.log10(y), np.log10(x), bins=50)
fig, (ax1) = plt.subplots()
ax1.imshow(samps, extent=[0, 5, 0, 5])
ax1.xaxis.set_major_formatter(fake_log)
ax1.yaxis.set_major_formatter(fake_log)
ax1.set_xlabel('X')
ax1.set_ylabel('Y')
答案 1 :(得分:1)
如果您只想更改标签,可以import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
a = [389, 350, 130, 344, 392, 92, 51, 28, 309, 357, 64, 380, 332, 109, 284, 105,
50, 66, 156, 116, 75, 315, 155, 34, 155, 241, 320, 50, 97, 41, 274, 99, 133,
95, 306, 62, 187, 56, 110, 338, 102, 285, 386, 231, 238, 145, 216, 148, 105,
368, 176, 155, 106, 107, 36, 16, 28, 6, 322, 95, 122, 82, 64, 35, 72, 214,
192, 91, 117, 277, 101, 159, 96, 325, 79, 154, 314, 142, 147, 138, 48, 50,
178, 146, 224, 282, 141, 75, 151, 93, 135, 82, 125, 111, 49, 113, 165, 19,
118, 105, 92, 133, 77, 54, 72, 34]
#create CDF definition
def ecdf(data):
n = len(data)
x = np.sort(data)
y = np.arange(1.0, n+1) / n
return x, y
#Using +-1.5x IQR method for defining outliers
def outliers_iqr(ys):
quartile_1, quartile_3 = np.percentile(ys, [25, 75])
iqr = quartile_3 - quartile_1
lower_bound = quartile_1 - (iqr * 1.5)
upper_bound = quartile_3 + (iqr * 1.5)
return np.where((ys < lower_bound)), np.where((ys > upper_bound))
def generate_plot(ax, df):
x, y = ecdf(df)
ax.plot(x, y, marker='.', linestyle='none')
ax.axvline(x.mean(), color='gray', linestyle='dashed', linewidth=2) #Add mean
x_m = int(x.mean())
y_m = stats.percentileofscore(df.as_matrix(), x.mean())/100.0
ax.annotate('(%s,%s)' % (x_m,int(y_m*100)) , xy=(x_m,y_m),
xytext=(10,-5), textcoords='offset points')
outliers= outliers_iqr(df.values)
#highlight the outliers area in the CDF plot
for outl in outliers:
vals = df.values[outl]
if vals.size>0:
ax.axvspan(np.min(vals),np.max(vals),alpha=0.5,color='red')
percentiles= np.array([25,50,75])
x_p = np.percentile(df, percentiles)
y_p = percentiles/100.0
ax.plot(x_p, y_p, marker='D', color='red', linestyle='none') # Overlay quartiles
for x,y in zip(x_p, y_p):
ax.annotate('%s' % int(x), xy=(x,y), xytext=(10,-5), textcoords='offset points')
ax.set_xlabel('Days')
ax.set_ylabel('ECDF')
ax.legend(('Days', "Mean", 'Quartiles'), loc='lower right')
fig, axes = plt.subplots(nrows = 1, ncols = 2, figsize=(10,5))
##original data
days = pd.DataFrame({"days" : a})
generate_plot(axes[0],days)
##fake data with outliers
b = np.concatenate([
np.random.normal(200,50,300),
np.random.normal(25,10,20),
np.random.normal(375,10,20),
])
np.random.shuffle(b)
generate_plot(axes[1],pd.DataFrame({"days" : b}))
##naming the subplots
axes[0].set_title('original data')
axes[1].set_title('fake data with outliers')
plt.show()
和plt.gca().set_xticklabels
直接访问这些标签。这是一个更改这些属性的plt.gca().set_yticklabels
属性的简单示例。
_text