我在导入的csv文件中生成基本分布直方图时遇到问题。该代码适用于来自另一个csv的一组数据,但不适用于我感兴趣的数据,这基本上是相同的。这是我试过的代码:
import pandas as pd
import numpy as np
import matplotlib as plt
data = pd.read_csv("idcases.csv")
data1 = data[(data["Disease"] == "Amebiasis") & (data["County"] == "Marin")]
data2 = data[(data["Disease"] == "Amebiasis") & (data["County"] == "Sonoma")]
fig = plt.pyplot.figure()
ax = fig.add_subplot(111)
ax.hist(data1['Population'], bins =10, range = (data1['Population'].min(), data1['Population'].max()))
plt.pyplot.xlabel('Population')
plt.pyplot.ylabel('Count of Population')
plt.pyplot.show()
哪个收益率:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-35-63303aa9d8a5> in <module>()
1 fig = plt.pyplot.figure()
2 ax = fig.add_subplot(111)
----> 3 ax.hist(data1['Population'], bins =10, range = (data1['Population'].min(), data1['Population'].max()))
4 plt.pyplot.xlabel('Count')
5 plt.pyplot.ylabel('Count of Population')
C:\Program Files (x86)\Anaconda\lib\site-packages\matplotlib\axes\_axes.py in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
5602 # Massage 'x' for processing.
5603 # NOTE: Be sure any changes here is also done below to 'weights'
-> 5604 if isinstance(x, np.ndarray) or not iterable(x[0]):
5605 # TODO: support masked arrays;
5606 x = np.asarray(x)
C:\Program Files (x86)\Anaconda\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
549 def __getitem__(self, key):
550 try:
--> 551 result = self.index.get_value(self, key)
552
553 if not np.isscalar(result):
C:\Program Files (x86)\Anaconda\lib\site-packages\pandas\core\index.py in get_value(self, series, key)
1721
1722 try:
-> 1723 return self._engine.get_value(s, k)
1724 except KeyError as e1:
1725 if len(self) > 0 and self.inferred_type in ['integer','boolean']:
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:3204)()
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:2903)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3843)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6525)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6463)()
KeyError: 0L
我做错了什么?这是我正在阅读的数据的一部分。该代码不适用于任何字段,包括“计数”或“费率”
Disease County Year Sex Count Population Rate CI.lower \
882 Amebiasis Marin 2001 Total 14 247731 5.651 3.090
883 Amebiasis Marin 2001 Female 0 125414 0.000 0.000
884 Amebiasis Marin 2001 Male 0 122317 0.000 0.000
885 Amebiasis Marin 2002 Total 7 247382 2.830 1.138
886 Amebiasis Marin 2002 Female 0 125308 0.000 0.000
887 Amebiasis Marin 2002 Male 0 122074 0.000 0.000
888 Amebiasis Marin 2003 Total 9 247280 3.640 1.664
889 Amebiasis Marin 2003 Female 0 125259 0.000 0.000
890 Amebiasis Marin 2003 Male 0 122021 0.000 0.000
答案 0 :(得分:1)
从matploblib-v1.4.3
升级到matplotlib-v1.5.0
时,我注意到pandas.Series
的绘图已停止工作,例如:
ax.plot_date(df['date'], df['raw'], '.-', label='raw')
会导致KeyError: 0
异常。
您需要将numpy.ndarray
而不是pandas.Series
传递给plot_date
函数:
ax.plot_date(df['date'].values, df['raw'].values, '.-', label='raw')
让我们看一下异常的完整回溯:
# ... PREVIOUS TRACEBACK MESSAGES OMITTED FOR BREVITY ...
C:\Users\pedromdu\AppData\Local\Continuum\Anaconda3\lib\site-packages\matplotlib\dates.py in default_units(x, axis)
1562
1563 try:
-> 1564 x = x[0]
1565 except (TypeError, IndexError):
1566 pass
C:\Users\pedromdu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
555 def __getitem__(self, key):
556 try:
--> 557 result = self.index.get_value(self, key)
558
559 if not np.isscalar(result):
C:\Users\pedromdu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\index.py in get_value(self, series, key)
1788
1789 try:
-> 1790 return self._engine.get_value(s, k)
1791 except KeyError as e1:
1792 if len(self) > 0 and self.inferred_type in ['integer','boolean']:
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:3204)()
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:2903)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3843)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6525)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6463)()
KeyError: 0
请注意,当matploblib尝试执行x=x[0]
时会发生错误。如果您的pandas系列没有使用从零开始的整数编制索引,那么这将失败,因为这将查找索引值为0
的项目,而不是0th
元素的pandas.Series
元素}。
要解决此问题,我们需要从numpy.ndarray
中的数据中获取pandas.Series
,然后将其用于绘图:
ax.plot_date(df['date'].values, df['raw'].values, '.-', label='raw')
答案 1 :(得分:0)
我的情节:
import io
import matplotlib.pyplot as plt
s = """ Disease County Year Sex Count Population Rate CI.lower
Amebiasis Marin 2001 Total 14 247731 5.651 3.090
Amebiasis Marin 2001 Female 0 125414 0.000 0.000
Amebiasis Marin 2001 Male 0 122317 0.000 0.000
Amebiasis Marin 2002 Total 7 247382 2.830 1.138
Amebiasis Marin 2002 Female 0 125308 0.000 0.000
Amebiasis Marin 2002 Male 0 122074 0.000 0.000
Amebiasis Marin 2003 Total 9 247280 3.640 1.664
Amebiasis Marin 2003 Female 0 125259 0.000 0.000
Amebiasis Marin 2003 Male 0 122021 0.000 0.000 """
fobj = io.StringIO(s)
data1 = pd.read_csv(fobj, delim_whitespace=True)
plt.hist(data1['Population'], bins =10, range = (data1['Population'].min(), data1['Population'].max()))
plt.xlabel('Population')
plt.ylabel('Count of Population')
plt.show()