在Pandas数据帧中设置索引时的KeyError

时间:2017-08-24 11:07:44

标签: python excel pandas dataframe openpyxl

我在尝试设置数据帧的索引时遇到了错误的错误。在以同样的方式设置索引之前我没有遇到过这种情况,我想知道出了什么问题?数据没有列标题,因此DataFrame标题为0,1,2,4,5等。任何列标题都会出错。

我收到了KeyError:' 0'尝试使用第一列时(我想将其用作唯一索引)。

上下文: 在下面的示例中,我选择启用宏的Excel电子表格,压缩数据,读取并将其转换为DataFrame。

然后我想在列中包含文件名,设置索引并去掉空格,以便我可以使用索引标签来提取我需要的数据。并非每个工作表都有索引标签,所以我尝试了,除了跳过不在索引中包含这些标签的工作表。然后我想将每个结果连接到一个DataFrame中并挤压未使用的列。

import itertools
import glob
from openpyxl import load_workbook
from pandas import DataFrame
import pandas as pd
import os

def get_data(ws):
        for row in ws.values:
            row_it = iter(row)
            for cell in row_it:
                if cell is not None:
                    yield itertools.chain((cell,), row_it)
                    break

def read_workbook(file_):
        wb = load_workbook(file_, data_only=True)
        for sheet in wb.worksheets:
            ws = sheet
        return DataFrame(get_data(ws))

path =r'dir'
allFiles = glob.glob(path + "/*.xlsm")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
        parsed_file = read_workbook(file_)
        parsed_file['filename'] = os.path.basename(file_)
        parsed_file.set_index(['0'], inplace = True)
        parsed_file.index.str.strip()
    try: 
        parsed_file.loc["Staff" : "Total"].copy()
        list_.append(parsed_file)
    except KeyError:
        pass

frame = pd.concat(list_)
print(frame.dropna(axis='columns', thresh=2, inplace = True))

示例数据框,所需的索引位置和要提取的标签。

     index
     0          1   2 
0    5          2   4
1    RTJHD      5   9
2    ABCD       4   6
3    Staff      9   3 --- extract from here
4    FHDHSK     3   2
5    IRRJWK     7   1
6    FJDDCN     1   8
7    67         4   7
8    Total      5   3 --- to here

错误

Traceback (most recent call last):

  File "<ipython-input-29-d8fd24ca84ec>", line 1, in <module>
    runfile('dir.py', wdir='C:/dir/Documents')

  File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
    execfile(filename, namespace)

  File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
    exec(compile(scripttext, filename, 'exec'), glob, loc)

  File "dir.py", line 36, in <module>
    parsed_file.set_index(['0'], inplace = True)

  File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\frame.py", line 2830, in set_index
    level = frame[col]._values

  File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\frame.py", line 1964, in __getitem__
    return self._getitem_column(key)

  File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\frame.py", line 1971, in _getitem_column
    return self._get_item_cache(key)

  File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\generic.py", line 1645, in _get_item_cache
    values = self._data.get(item)

  File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\internals.py", line 3590, in get
    loc = self.items.get_loc(item)

  File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\indexes\base.py", line 2444, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))

  File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5280)

  File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126)

  File "pandas\_libs\hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20523)

  File "pandas\_libs\hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20477)

KeyError: '0'

1 个答案:

答案 0 :(得分:1)

您收到此错误是因为您的数据框在没有任何标头的情况下被读入。这意味着您的标头属于Int64Index类型:

Int64Index([0, 1, 2, 3, ...], dtype='int64')

此时,我建议您只需按索引访问df.columns,无论您何时被迫处理它们:

parsed_file.set_index(parsed_file.columns[0], inplace = True)

如果您通过索引访问,请不要对列名进行硬编码。另一种方法是分配一些你自己的列名,然后引用它们。