我正在尝试分析wine-quality
数据集。有两个数据集:red wine
数据集和white wine
。我将它们组合在一起形成wine_df
。我要绘制它。我想将红色直方图设为红色,将白色直方图设为白色。但是对于某些直方图,其标签和颜色是不一致的。例如,第四个人的标签为(4,white),而其颜色为红色。我该怎么办?感谢您的回答!
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
red_wine = pd.read_csv('https://raw.githubusercontent.com/nishanthgandhidoss/Wine-Quality/master/data/winequality-red.csv',
sep = ';')
white_wine = pd.read_csv('https://raw.githubusercontent.com/nishanthgandhidoss/Wine-Quality/master/data/winequality-white.csv',
sep = ';')
## Add a column to each data to identify the wine color
red_wine['color'] = 'red'
white_wine['color'] = 'white'
## Combine the two dataframes
wine_df = pd.concat([red_wine, white_wine])
colors = ['red','white']
plt.style.use('ggplot')
counts = wine_df.groupby(['quality', 'color']).count()['pH']
counts.plot(kind='bar', title='Counts by Wine Color and quality', color=colors, alpha=.7)
plt.xlabel('Quality and Color', fontsize=18)
plt.ylabel('Count', fontsize=18)
plt.show()
答案 0 :(得分:1)
颜色是索引的级别,因此可以使用它来指定颜色。将您的代码行更改为:
counts.plot(kind='bar', title='Counts by Wine Color and quality',
color=counts.index.get_level_values(1), alpha=.7)
在这种情况下,事实证明matplotlib
可以将索引中的值解释为颜色。通常,您可以将唯一值映射到可识别的颜色,例如:
color = counts.index.get_level_values(1).map({'red': 'green', 'white': 'black'})
pandas
正在按照打印顺序进行操作,但是您始终可以退回到matplotlib
以更可靠地循环显示颜色。这里的技巧是将color
转换为分类变量,以便它总是在groupby
之后表示,从而允许您仅指定列表['red', 'white']
import matplotlib.pyplot as plt
wine_df['color'] = wine_df.color.astype('category')
counts = wine_df.groupby(['quality', 'color']).count()['pH'].fillna(0)
ind = np.arange(len(counts))
plt.bar(ind, height=counts.values, color=['red', 'white'])
_ = plt.xticks(ind, counts.index.values, rotation=90)
plt.ylim(0,150) # So we an see (9, white)
plt.show()