Pandas KeyError:csv文件数据帧的年份

时间:2016-11-05 22:37:37

标签: python pandas

我有一个类似于他的数据框:

 BirthYear    Sex    Area    Count
2015         W      Dhaka    6
2015         M      Dhaka    3
2015         W      Khulna   1
2015         M      Khulna   8
2014         M      Dhaka    13
2014         W      Dhaka    20
2014         M      Khulna   9
2014         W      Khulna   6
2013         W      Dhaka    11
2013         M      Dhaka    2
2013         W      Khulna    8
2013         M      Khulna    5
2012         M      Dhaka    12
2012         W      Dhaka    4
2012         W      Khulna    7
2012         M      Khulna    1

现在我想在Pandas中创建一个条形图,其中只有男性和女性。将展示2015年出生的女性。 代码:

df = pd.read_csv('out.csv')
df=df.reset_index()
df=df.loc[df["BirthYear"]==2015]
agg_df = df.groupby(['Sex']).sum()
agg_df.reset_index(inplace=True)
piv_df = agg_df.pivot(columns='Sex', values='Count')
piv_df.plot.bar(stacked=True)
plt.show()

执行后,IDLE显示此错误:

    Traceback (most recent call last):
  File "C:\Users\sabid\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\indexes\base.py", line 1945, in get_loc
    return self._engine.get_loc(key)
  File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4066)
  File "pandas\index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:3930)
  File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)
  File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)
KeyError: 'BirthYear'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/sabid/Dropbox/Freelancing/data visualization python/pie.py", line 8, in <module>
    df=df.loc[df["StichtagDatJahr"]==2015]
  File "C:\Users\sabid\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\core\frame.py", line 1997, in __getitem__
    return self._getitem_column(key)
  File "C:\Users\sabid\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\core\frame.py", line 2004, in _getitem_column
    return self._get_item_cache(key)
  File "C:\Users\sabid\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\core\generic.py", line 1350, in _get_item_cache
    values = self._data.get(item)
  File "C:\Users\sabid\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\core\internals.py", line 3290, in get
    loc = self.items.get_loc(item)
  File "C:\Users\sabid\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\indexes\base.py", line 1947, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4066)
  File "pandas\index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:3930)
  File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)
  File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)
KeyError: 'BirthYear'

我从this link开始知道它发生了,因为'BirthYear'列名称前面有一些标题。 但我不知道如何删除标题并使代码工作。 对此有什么有效的解决方案吗?

2 个答案:

答案 0 :(得分:1)

您可以重命名列。

df.rename(columns=["BirthYear", "Sex", "Area", "Count"], inplace=True)

答案 1 :(得分:0)

我假设你想要这样的输出:

Barplot

我不确定这一点,但我认为使用pivot方法搞砸了你。您不需要使用pivot,因为agg_df基本上是数据透视表。这是我用来创建该图表的代码:

import pandas as pd

# I made this to approximate your CSV file.
table = {
    'BirthYear': [2015, 2015, 2015, 2015, 2014, 2014,],
    'Sex': ['W', 'M', 'W', 'M', 'M', 'W',],
    'Area': ['Dhaka', 'Dhaka', 'Khulna', 'Khulna', 'Dhaka', 'Dhaka',],
    'Count': [6, 3, 1, 8, 13, 20]
}

df = pd.DataFrame(table)
df = df.reset_index(drop=True)

# Select people born in 2015.
df = df.loc[df["BirthYear"] == 2015]

# This is basically a pivot table.
agg_df = df.groupby(['Sex']).sum()

# Make the plot.
agg_df['Count'].plot.bar(stacked=True)