Question

我正在尝试分析wine-quality数据集。有两个数据集：red wine数据集和white wine。我将它们组合在一起形成wine_df。我要绘制它。我想将红色直方图设为红色，将白色直方图设为白色。但是对于某些直方图，其标签和颜色是不一致的。例如，第四个人的标签为（4，white），而其颜色为红色。我该怎么办？感谢您的回答！

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

red_wine = pd.read_csv('https://raw.githubusercontent.com/nishanthgandhidoss/Wine-Quality/master/data/winequality-red.csv',
                      sep = ';')
white_wine = pd.read_csv('https://raw.githubusercontent.com/nishanthgandhidoss/Wine-Quality/master/data/winequality-white.csv', 
                        sep = ';')

## Add a column to each data to identify the wine color 
red_wine['color'] = 'red'
white_wine['color'] = 'white'

## Combine the two dataframes    
wine_df = pd.concat([red_wine, white_wine])

colors = ['red','white']
plt.style.use('ggplot')
counts = wine_df.groupby(['quality', 'color']).count()['pH']
counts.plot(kind='bar', title='Counts by Wine Color and quality', color=colors, alpha=.7)
plt.xlabel('Quality and Color', fontsize=18)
plt.ylabel('Count', fontsize=18)
plt.show()

Answer 1

颜色是索引的级别，因此可以使用它来指定颜色。将您的代码行更改为：

counts.plot(kind='bar', title='Counts by Wine Color and quality', 
            color=counts.index.get_level_values(1), alpha=.7)

在这种情况下，事实证明matplotlib可以将索引中的值解释为颜色。通常，您可以将唯一值映射到可识别的颜色，例如：

color = counts.index.get_level_values(1).map({'red': 'green', 'white': 'black'})

pandas正在按照打印顺序进行操作，但是您始终可以退回到matplotlib以更可靠地循环显示颜色。这里的技巧是将color转换为分类变量，以便它总是在groupby之后表示，从而允许您仅指定列表['red', 'white']

import matplotlib.pyplot as plt

wine_df['color'] = wine_df.color.astype('category')
counts = wine_df.groupby(['quality', 'color']).count()['pH'].fillna(0)

ind = np.arange(len(counts))
plt.bar(ind, height=counts.values, color=['red', 'white'])
_ = plt.xticks(ind, counts.index.values, rotation=90)
plt.ylim(0,150)  # So we an see (9, white)
plt.show()

直方图的颜色及其标签不一致

1 个答案: