我有一个数据集,其中包含3列标题为Gender
(M
或F
),House
(A
或B
或C
)和Indicator
(0或1)。我想绘制由性别着色的房屋A
的直方图。这是我执行此操作的代码:
import pandas as pd
df = pd.read_csv('dataset.csv', usecols=['House','Gender','Indicator')
A = df[df['House']=='A']
A = pd.DataFrame(A, columns=['Indicator', 'Gender'])
这将正确导入房屋A的相应性别值,如其内容所示:
print(A)
Indicator Gender
0 1 Male
1 1 Male
2 1 Male
4 1 Female
7 1 Male
8 1 Male
11 1 Male
14 1 Male
17 1 Male
18 1 Female
19 1 Female
20 1 Female
21 1 Male
24 1 Male
26 1 Female
27 1 Male
... ... ...
现在,当我想按我在MATLAB中绘制的方式绘制按性别着色的A直方图时,会出现错误:
import matplotlib.pyplot as plt
plt.hist(A)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-130-81c3aef1748b> in <module>()
----> 1 plt.hist(A)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\pyplot.py in hist(x, bins, range, density, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, normed, hold, data, **kwargs)
3130 histtype=histtype, align=align, orientation=orientation,
3131 rwidth=rwidth, log=log, color=color, label=label,
-> 3132 stacked=stacked, normed=normed, data=data, **kwargs)
3133 finally:
3134 ax._hold = washold
~\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\__init__.py in inner(ax, *args, **kwargs)
1853 "the Matplotlib list!)" % (label_namer, func.__name__),
1854 RuntimeWarning, stacklevel=2)
-> 1855 return func(ax, *args, **kwargs)
1856
1857 inner.__doc__ = _add_data_doc(inner.__doc__,
~\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\axes\_axes.py in hist(***failed resolving arguments***)
6512 for xi in x:
6513 if len(xi) > 0:
-> 6514 xmin = min(xmin, xi.min())
6515 xmax = max(xmax, xi.max())
6516 bin_range = (xmin, xmax)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\_methods.py in _amin(a, axis, out, keepdims)
27
28 def _amin(a, axis=None, out=None, keepdims=False):
---> 29 return umr_minimum(a, axis, None, out, keepdims)
30
31 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):
TypeError: '<=' not supported between instances of 'int' and 'str'
似乎我们需要指定要对其进行直方图绘制的确切列。它不能自动理解(不同于MATLAB),它需要根据另一列进行着色。因此,执行以下操作绘制直方图,但没有颜色表示性别:
plt.hist(A['Indicator'])
那么,如何制作堆叠的直方图或并排按性别着色的直方图?像这样,除了每个指标在x = 0和x = 1处只有2条:
x = np.random.randn(1000, 2)
colors = ['red', 'green']
plt.hist(x, color=colors)
plt.legend(['Male', 'Female'])
plt.title('Male and Female indicator by gender')
我试图通过将2个数据框列复制到列表的2列中,然后尝试绘制直方图来模仿以上内容:
y=[]
y[0] = A[A['Gender'=='M']].tolist()
y[1] = A[A['Gender'=='F']].tolist()
plt.hist(y)
但这会产生以下错误:
KeyError Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3062 try:
-> 3063 return self._engine.get_loc(key)
3064 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: False
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-152-138cb74b6e00> in <module>()
2 A= pd.DataFrame(A, columns=['Indicator', 'Gender'])
3 y=[]
----> 4 y[0] = A[A['Gender'=='M']].tolist()
5 y[1] = A[A['Gender'=='F']].tolist()
6 plt.hist(y)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2683 return self._getitem_multilevel(key)
2684 else:
-> 2685 return self._getitem_column(key)
2686
2687 def _getitem_column(self, key):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
2690 # get column
2691 if self.columns.is_unique:
-> 2692 return self._get_item_cache(key)
2693
2694 # duplicate columns & possible reduce dimensionality
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
2484 res = cache.get(item)
2485 if res is None:
-> 2486 values = self._data.get(item)
2487 res = self._box_item_values(item, values)
2488 cache[item] = res
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
4113
4114 if not isna(item):
-> 4115 loc = self.items.get_loc(item)
4116 else:
4117 indexer = np.arange(len(self.items))[isna(self.items)]
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3063 return self._engine.get_loc(key)
3064 except KeyError:
-> 3065 return self._engine.get_loc(self._maybe_cast_indexer(key))
3066
3067 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: False
答案 0 :(得分:1)
以下内容应该可以使用,但不要对您的数据进行测试。
genders = A.Gender.unique()
plt.hist([A.loc[A.Gender == x, 'Indicator'] for x in genders], label=genders)
您的代码在A[A['Gender'=='M']]
上失败,因为它应该是A[A['Gender'] == 'M']
才能获得Male元素,但是您还需要选择所需的列。