Question

我有一个非常大且稀疏的垃圾邮件Twitter帐户数据集，它要求我缩放x轴，以便能够可视化各种变量的分布（直方图，kde等）和cdf（tweets_count，数量粉丝/追随者等。）

    > describe(spammers_class1$tweets_count)
  var       n   mean      sd median trimmed mad min    max  range  skew kurtosis   se
1   1 1076817 443.47 3729.05     35   57.29  43   0 669873 669873 53.23  5974.73 3.59

在此数据集中，值0具有非常重要性（实际上0应具有最高密度）。但是，使用对数标度，这些值将被忽略。例如，我想将值更改为0.1，但是有垃圾邮件帐户有10 ^ -1的关注者是没有意义的。

那么，python和matplotlib中的解决方法是什么？

Answer 1

ax1.set_xlim(0, 1e3)

以下是来自matplotlib文档的example。

并且它以这种方式设置轴的极限值：

ax1.set_xlim(1e1, 1e3)
ax1.set_ylim(1e2, 1e3)

Answer 2

为每个x值添加1，然后记录日志：

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as ticker

fig, ax = plt.subplots()
x = [0, 10, 100, 1000]
y = [100, 20, 10, 50]
x = np.asarray(x) + 1 
y = np.asarray(y)
ax.plot(x, y)
ax.set_xscale('log')
ax.set_xlim(x.min(), x.max())
ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x-1)))
ax.xaxis.set_major_locator(ticker.FixedLocator(x))
plt.show()

enter image description here

使用

ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x-1)))
ax.xaxis.set_major_locator(ticker.FixedLocator(x))

根据x的非对数值重新标记刻度线。

（我最初的建议是使用plt.xticks(x, x-1)，但这会影响所有轴。为了将更改隔离到一个特定轴，我将所有命令调用更改为ax，而不是调用{{ 1}}。）

plt删除包含matplotlib，NaN或inf值的点。由于-inf为log(0)，因此将从日志图中删除与-inf对应的点。

如果您将所有x值增加1，从x=0开始，则log(1) = 0对应的点将不会在日志图上的x=0处绘制。

其余的x值也会移一，但对于眼睛来说无关紧要，因为对于x=log(1)=0的大值，log(x+1)非常接近log(x)。

Matplotlib对数刻度值为零

2 个答案: