Question

我有一个大的数据集，在分布中是对数的。我想做一个热图，所以我做了一个2D直方图并将其传递给了implot。因为数据是对数的，所以我将数据的日志传递给直方图。但是，当我制作绘图时，我希望恢复轴（即10 ^ hist bin值）和对数轴。如果我将轴设置为日志样式，则图像看起来全部倾斜。当我将数据传递给直方图时，数据已经“记录”了，因此我不希望图像受影响，只是轴。因此，在下面的示例中，我希望左侧的图像与右侧的轴。

我想我可以用假的叠加轴做到这一点，但如果有更好的方法，我不喜欢做那种事情......

import numpy as np
import matplotlib.pyplot as plt

x=10**np.random.random(10000)*5
y=10**np.random.random(10000)*5

samps, xedges, yedges = np.histogram2d(np.log10(y), np.log10(x),     bins=50)    

ax = plt.subplot(121)

plt.imshow(samps, extent=[0,5,0,5])
plt.xlabel('Log10 X')
plt.ylabel('Log10 Y')

ax = plt.subplot(122)    
plt.imshow(samps, extent=[10**0,10**5,10**0,10**5])
plt.xlabel('X')
plt.ylabel('Y')
plt.xscale('log')
plt.yscale('log')
plt.show()

Answer 1

您需要使用自定义格式化程序。这是matplotlib文档中的一个示例： https://matplotlib.org/examples/pylab_examples/custom_ticker1.html

我倾向于使用FuncFormatter作为示例。主要技巧是你的函数需要接受参数x和pos。老实说，我不知道pos的用途。也许甚至没有故意，但你可以使用FuncFormatter作为装饰者，这就是我在下面所做的：

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

@plt.FuncFormatter
def fake_log(x, pos):
    'The two args are the value and tick position'
    return r'$10^{%d}$' % (x)

x=10**np.random.random(10000)*5
y=10**np.random.random(10000)*5

samps, xedges, yedges = np.histogram2d(np.log10(y), np.log10(x), bins=50)    

fig, (ax1) = plt.subplots()
ax1.imshow(samps, extent=[0, 5, 0, 5])
ax1.xaxis.set_major_formatter(fake_log)
ax1.yaxis.set_major_formatter(fake_log)
ax1.set_xlabel('X')
ax1.set_ylabel('Y')

Answer 2

如果您只想更改标签，可以import numpy as np import pandas as pd import matplotlib.pyplot as plt from scipy import stats a = [389, 350, 130, 344, 392, 92, 51, 28, 309, 357, 64, 380, 332, 109, 284, 105, 50, 66, 156, 116, 75, 315, 155, 34, 155, 241, 320, 50, 97, 41, 274, 99, 133, 95, 306, 62, 187, 56, 110, 338, 102, 285, 386, 231, 238, 145, 216, 148, 105, 368, 176, 155, 106, 107, 36, 16, 28, 6, 322, 95, 122, 82, 64, 35, 72, 214, 192, 91, 117, 277, 101, 159, 96, 325, 79, 154, 314, 142, 147, 138, 48, 50, 178, 146, 224, 282, 141, 75, 151, 93, 135, 82, 125, 111, 49, 113, 165, 19, 118, 105, 92, 133, 77, 54, 72, 34] #create CDF definition def ecdf(data): n = len(data) x = np.sort(data) y = np.arange(1.0, n+1) / n return x, y #Using +-1.5x IQR method for defining outliers def outliers_iqr(ys): quartile_1, quartile_3 = np.percentile(ys, [25, 75]) iqr = quartile_3 - quartile_1 lower_bound = quartile_1 - (iqr * 1.5) upper_bound = quartile_3 + (iqr * 1.5) return np.where((ys < lower_bound)), np.where((ys > upper_bound)) def generate_plot(ax, df): x, y = ecdf(df) ax.plot(x, y, marker='.', linestyle='none') ax.axvline(x.mean(), color='gray', linestyle='dashed', linewidth=2) #Add mean x_m = int(x.mean()) y_m = stats.percentileofscore(df.as_matrix(), x.mean())/100.0 ax.annotate('(%s,%s)' % (x_m,int(y_m*100)) , xy=(x_m,y_m), xytext=(10,-5), textcoords='offset points') outliers= outliers_iqr(df.values) #highlight the outliers area in the CDF plot for outl in outliers: vals = df.values[outl] if vals.size>0: ax.axvspan(np.min(vals),np.max(vals),alpha=0.5,color='red') percentiles= np.array([25,50,75]) x_p = np.percentile(df, percentiles) y_p = percentiles/100.0 ax.plot(x_p, y_p, marker='D', color='red', linestyle='none') # Overlay quartiles for x,y in zip(x_p, y_p): ax.annotate('%s' % int(x), xy=(x,y), xytext=(10,-5), textcoords='offset points') ax.set_xlabel('Days') ax.set_ylabel('ECDF') ax.legend(('Days', "Mean", 'Quartiles'), loc='lower right') fig, axes = plt.subplots(nrows = 1, ncols = 2, figsize=(10,5)) ##original data days = pd.DataFrame({"days" : a}) generate_plot(axes[0],days) ##fake data with outliers b = np.concatenate([ np.random.normal(200,50,300), np.random.normal(25,10,20), np.random.normal(375,10,20), ]) np.random.shuffle(b) generate_plot(axes[1],pd.DataFrame({"days" : b})) ##naming the subplots axes[0].set_title('original data') axes[1].set_title('fake data with outliers') plt.show()和plt.gca().set_xticklabels直接访问这些标签。这是一个更改这些属性的plt.gca().set_yticklabels属性的简单示例。

_text

如何应用没有对数缩放图像的对数轴标签（matplotlib imshow）

2 个答案: