如何在Python中绘制一列被另一列着色的直方图?

时间:2018-06-27 08:27:55

标签: python pandas histogram visualization data-visualization

我有一个数据集,其中包含3列标题为GenderMF),HouseABC)和Indicator(0或1)。我想绘制由性别着色的房屋A的直方图。这是我执行此操作的代码:

import pandas as pd

df = pd.read_csv('dataset.csv', usecols=['House','Gender','Indicator')

A = df[df['House']=='A']
A = pd.DataFrame(A, columns=['Indicator', 'Gender'])

这将正确导入房屋A的相应性别值,如其内容所示:

print(A)
            Indicator    Gender
0                   1      Male
1                   1      Male
2                   1      Male
4                   1    Female
7                   1      Male
8                   1      Male
11                  1      Male
14                  1      Male
17                  1      Male
18                  1    Female
19                  1    Female
20                  1    Female
21                  1      Male
24                  1      Male
26                  1    Female
27                  1      Male
...               ...       ...

现在,当我想按我在MATLAB中绘制的方式绘制按性别着色的A直方图时,会出现错误:

import matplotlib.pyplot as plt
plt.hist(A)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-130-81c3aef1748b> in <module>()
----> 1 plt.hist(A)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\pyplot.py in hist(x, bins, range, density, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, normed, hold, data, **kwargs)
   3130                       histtype=histtype, align=align, orientation=orientation,
   3131                       rwidth=rwidth, log=log, color=color, label=label,
-> 3132                       stacked=stacked, normed=normed, data=data, **kwargs)
   3133     finally:
   3134         ax._hold = washold

~\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\__init__.py in inner(ax, *args, **kwargs)
   1853                         "the Matplotlib list!)" % (label_namer, func.__name__),
   1854                         RuntimeWarning, stacklevel=2)
-> 1855             return func(ax, *args, **kwargs)
   1856 
   1857         inner.__doc__ = _add_data_doc(inner.__doc__,

~\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\axes\_axes.py in hist(***failed resolving arguments***)
   6512             for xi in x:
   6513                 if len(xi) > 0:
-> 6514                     xmin = min(xmin, xi.min())
   6515                     xmax = max(xmax, xi.max())
   6516             bin_range = (xmin, xmax)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\_methods.py in _amin(a, axis, out, keepdims)
     27 
     28 def _amin(a, axis=None, out=None, keepdims=False):
---> 29     return umr_minimum(a, axis, None, out, keepdims)
     30 
     31 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):

TypeError: '<=' not supported between instances of 'int' and 'str'

似乎我们需要指定要对其进行直方图绘制的确切列。它不能自动理解(不同于MATLAB),它需要根据另一列进行着色。因此,执行以下操作绘制直方图,但没有颜色表示性别:

plt.hist(A['Indicator'])

enter image description here

那么,如何制作堆叠的直方图或并排按性别着色的直方图?像这样,除了每个指标在x = 0和x = 1处只有2条:

x = np.random.randn(1000, 2)

colors = ['red', 'green']
plt.hist(x, color=colors)
plt.legend(['Male', 'Female'])
plt.title('Male and Female indicator by gender')

enter image description here

我试图通过将2个数据框列复制到列表的2列中,然后尝试绘制直方图来模仿以上内容:

y=[]
y[0] = A[A['Gender'=='M']].tolist()
y[1] = A[A['Gender'=='F']].tolist()
plt.hist(y)

但这会产生以下错误:

KeyError                                  Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3062             try:
-> 3063                 return self._engine.get_loc(key)
   3064             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: False

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-152-138cb74b6e00> in <module>()
      2 A= pd.DataFrame(A, columns=['Indicator', 'Gender'])
      3 y=[]
----> 4 y[0] = A[A['Gender'=='M']].tolist()
      5 y[1] = A[A['Gender'=='F']].tolist()
      6 plt.hist(y)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2683             return self._getitem_multilevel(key)
   2684         else:
-> 2685             return self._getitem_column(key)
   2686 
   2687     def _getitem_column(self, key):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
   2690         # get column
   2691         if self.columns.is_unique:
-> 2692             return self._get_item_cache(key)
   2693 
   2694         # duplicate columns & possible reduce dimensionality

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
   2484         res = cache.get(item)
   2485         if res is None:
-> 2486             values = self._data.get(item)
   2487             res = self._box_item_values(item, values)
   2488             cache[item] = res

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
   4113 
   4114             if not isna(item):
-> 4115                 loc = self.items.get_loc(item)
   4116             else:
   4117                 indexer = np.arange(len(self.items))[isna(self.items)]

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3063                 return self._engine.get_loc(key)
   3064             except KeyError:
-> 3065                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   3066 
   3067         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: False

1 个答案:

答案 0 :(得分:1)

以下内容应该可以使用,但不要对您的数据进行测试。

genders = A.Gender.unique()
plt.hist([A.loc[A.Gender == x, 'Indicator'] for x in genders], label=genders)

您的代码在A[A['Gender'=='M']]上失败,因为它应该是A[A['Gender'] == 'M']才能获得Male元素,但是您还需要选择所需的列。