Question

我正在尝试使用以下方法编写条形图的函数：

def bar_plot(data, x, y, title):
    sns.set_style('darkgrid')
    data = data.sort_values(ascending=False, by=x)
    data = data.head(n=10)
    if data[x].any() > 1000000:
        data[x] = data[x] / 1000000
        ax = sns.catplot(data=data, x=x, y=y, kind='bar')
        ax.set_xlabels(x + ' ($ Millions)', size=15)
        plt.subplots_adjust(top=0.9)
        ax.set_ylabels(y, size=15)
        ax.fig.suptitle(title, size=35)
    else:
       ax = sns.catplot(data=data, x=x, y=y, kind='bar')
       ax.set_xlabels(x, size=15)
       plt.subplots_adjust(top=0.9)
       ax.set_ylabels(y, size=15)
       ax.fig.suptitle(title, size=35)

我希望它与我正在使用的几个不同的数据框一起使用。我有一些数据框有很多大值，有些有小值。我想用一百万除以那些具有较大值的数据框，以使图形更易于阅读。我最初的理解是data[x].any() > 1000000会在数据框中找到超过一百万的任何行，然后返回True，然后运行if语句。但是，即使数据框的值显然超过一百万，它也会跳到else语句。

在试图解决问题时，我通过寻找小于一百万的值来反转了if语句：

def bar_plot(data, x, y, title):
    sns.set_style('darkgrid')
    data = data.sort_values(ascending=False, by=x)
    data = data.head(n=10)
    if data[x].any() < 1000000:
        ax = sns.catplot(data=data, x=x, y=y, kind='bar')
        ax.set_xlabels(x, size=15)
        plt.subplots_adjust(top=0.9)
        ax.set_ylabels(y, size=15)
        ax.fig.suptitle(title, size=35)
    else:
        data[x] = data[x] / 1000000
        ax = sns.catplot(data=data, x=x, y=y, kind='bar')
        ax.set_xlabels(x + ' ($ Millions)', size=15)
        plt.subplots_adjust(top=0.9)
        ax.set_ylabels(y, size=15)
        ax.fig.suptitle(title, size=35)

此解释现在仅返回if语句，即使值超过一百万，也永远不会返回else语句。我有些困惑，为什么即使条件发生了翻转，功能的同一部分也只能起作用。

Answer 1

问题在于您的病情排序。

这有效：

(data[x]>1000000).any()

执行data[x].any() > 1000000时，您正在询问Python：

我的列中是否有任何True值？这样只会使您得到True（1）或False（0）。

然后您要问：

1（或0）大于1000000吗？这将始终为False，因此您始终会转到else语句。

希望这可以清除一切！

Answer 2

或者，考虑使用assign创建一个新列，该列使用np.where逻辑分配直接在数据框中处理百万检查和转换。与if/else区块相比，这是更具流行性的解决方案（即对熊猫来说是“ pythonic”的解决方案），在该区块中，Pandas系列被视为类似于Python列表。

此外，下面使用Series.div代替除符号/，并使用十进制舍入和E整数表示法表示一百万（避免计数多个零）。代码是DRY-er，无需重复绘制线条。

def bar_plot(data, x, y, title):
    sns.set_style('darkgrid')
    data = (data.sort_values(ascending=False, by=x)
                .head(n=10)
                # ASSIGN NEW COLUMN WITH CONVERSION
                .assign(val = lambda d: np.where((d[x] > 1E6).any(), d[x].div(1E6), d[x]).round(2))
                .reset_index(drop=True)
            )

    # ADJUST LABEL IF NO CONVERSION OCCURRED (CHECKING EQUALITY OF COLUMNS)
    xlab = x + ' ($ Millions)' if (data['val'].ne(data[x].round(2))).all() else x

    ax = sns.catplot(data=data, x='val', y=y, kind='bar')
    ax.set_xlabels(xlab, size=15)
    plt.subplots_adjust(top=0.9)
    ax.set_ylabels(y, size=15)
    ax.fig.suptitle(title, size=35)

    plt.show()
    plt.clf()
    plt.close()

在随机数据上进行演示

import numpy as np
import pandas as pd

np.random.seed(121119)
data = pd.DataFrame({'dep': np.random.uniform(1, 5E5, 15),
                     'num1': np.random.randint(1, 10, 15),         # NO VALUES OVER 1 MILLION
                     'num2': np.random.normal(2, 1, 15),           # NO VALUES OVER 1 MILLION
                     'num3': np.random.randint(1E3, 1E7, 15)       # MANY VALUES OVER 1 MILLION
                    })

bar_plot(data, 'num2', 'dep', 'my title')
bar_plot(data, 'num3', 'dep', 'my title')

Python seaborn绘图：编写条形图函数时遇到一个小问题

2 个答案: