Question

我有一个装有熊猫的数据框pd1

pd1 = pd.read_csv(r'c:\am\wiki_stats\topandas.txt',sep=':',
                  header=None, names  = ['date-time','domain','requests-qty','response-bytes'],
                   parse_dates=[1], converters={'date-time': to_datetime}, index_col = 'date-time')

带有索引

>> pd1.index:  

 DatetimeIndex(['2016-01-01 00:00:00', '2016-01-01 00:00:00',
                '2016-01-01 00:00:00', '2016-01-01 00:00:00',
                '2016-01-01 00:00:00', '2016-01-01 00:00:00',
                '2016-01-01 00:00:00', '2016-01-01 00:00:00',
                '2016-01-01 00:00:00', '2016-01-01 00:00:00',
                ...
                '2016-08-05 12:00:00', '2016-08-05 12:00:00',
                '2016-08-05 12:00:00', '2016-08-05 12:00:00',
                '2016-08-05 12:00:00', '2016-08-05 12:00:00',
                '2016-08-05 12:00:00', '2016-08-05 12:00:00',
                '2016-08-05 12:00:00', '2016-08-05 12:00:00'],
               dtype='datetime64[ns]', name='date-time', length=6084158, freq=None)

但是当我想将索引设置为该列时，出现如下错误（我最初想设置多列索引，该错误出现了，然后尝试从pd_new_index = pd1.set_index(['requests-qty','domain'])创建其他数据框，并将其他列设置为索引（确定）并创建新帧，同时还将索引设置为“日期时间”列后退pd_new_2 = pd_new_index.set_index(['date-time'])-同样的错误）。 “日期时间”看起来不像特殊关键字，而且该列现在是索引。为什么会出错？

KeyError跟踪（最近的呼叫   持续）   C：\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ core \ indexes \ base.py   在get_loc（self，key，method，tolerance）2656中尝试：   -> 2657返回self._engine.get_loc（key）2658，除了KeyError：
     pandas._libs.index.IndexEngine.get_loc（）中的
pandas / _libs / index.pyx
     pandas._libs.index.IndexEngine.get_loc（）中的
pandas / _libs / index.pyx

pandas / _libs / hashtable_class_helper.pxi在   pandas._libs.hashtable.PyObjectHashTable.get_item（）

pandas / _libs / hashtable_class_helper.pxi在   pandas._libs.hashtable.PyObjectHashTable.get_item（）

KeyError：“日期时间”

在处理上述异常期间，发生了另一个异常：

KeyError跟踪（最近的呼叫   最后）   ----> 1个pd_new_2 = pd_new_index.set_index（['date-time']）

C：\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ core \ frame.py在   set_index（self，keys，drop，append，inplace，verify_integrity）4176   names.append（None）4177其他：   -> 4178级=框架[col]。值4179个名称。如果掉落，则追加（col）4180：

C：\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ core \ frame.py在    getitem （自身，密钥）2925，如果self.columns.nlevels> 1：2926返回self._getitem_multilevel（key）   -> 2927 indexer = self.columns.get_loc（key）2928 if is_integer（indexer）：2929 indexer = [indexer]

C：\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ core \ indexes \ base.py   在get_loc（自身，键，方法，公差）中2657
  返回self._engine.get_loc（key）2658，但KeyError除外：   -> 2659返回self._engine.get_loc（self._maybe_cast_indexer（key））2660
  索引器= self.get_indexer（[键]，方法=方法，公差=公差）   2661如果indexer.ndim> 1或indexer.size> 1：
     pandas._libs.index.IndexEngine.get_loc（）中的
pandas / _libs / index.pyx
     pandas._libs.index.IndexEngine.get_loc（）中的
pandas / _libs / index.pyx

pandas / _libs / hashtable_class_helper.pxi在   pandas._libs.hashtable.PyObjectHashTable.get_item（）

pandas / _libs / hashtable_class_helper.pxi在   pandas._libs.hashtable.PyObjectHashTable.get_item（）

KeyError：“日期时间”

Answer 1

原因是date-time已经是索引，在这里DatetimeIndex，因此无法像按名称的列那样选择它。

原因是参数index_col：

pd1 = pd.read_csv(r'c:\am\wiki_stats\topandas.txt',
                  sep=':',
                  header=None, 
                  names  = ['date-time','domain','requests-qty','response-bytes'],
                  parse_dates=[1], 
                  converters={'date-time': to_datetime}, 
                  index_col = 'date-time')

对于MultiIndex，在index_col中添加列名称列表，删除converters并在parse_dates参数中指定列名称：

import pandas as pd
from io import StringIO

temp=u"""2016-01-01:d1:0:0
2016-01-02:d2:0:1
2016-01-03:d3:1:0"""
#after testing replace 'pd.compat.StringIO(temp)' to r'c:\am\wiki_stats\topandas.txt''
df = pd.read_csv(StringIO(temp), 
                 sep=':',
                 header=None, 
                 names  = ['date-time','domain','requests-qty','response-bytes'],
                 parse_dates=['date-time'], 
                 index_col = ['date-time','domain'])

print (df)

date-time  domain                              
2016-01-01 d1                 0               0
2016-01-02 d2                 0               1
2016-01-03 d3                 1               0

print (df.index)
MultiIndex([('2016-01-01', 'd1'),
            ('2016-01-02', 'd2'),
            ('2016-01-03', 'd3')],
           names=['date-time', 'domain'])

EDIT1：在append中使用set_index参数的解决方案：

import pandas as pd
from io import StringIO


temp=u"""2016-01-01:d1:0:0
2016-01-02:d2:0:1
2016-01-03:d3:1:0"""
#after testing replace 'pd.compat.StringIO(temp)' to r'c:\am\wiki_stats\topandas.txt''
df = pd.read_csv(StringIO(temp), 
                 sep=':',
                 header=None, 
                 names  = ['date-time','domain','requests-qty','response-bytes'],
                 parse_dates=['date-time'], 
                 index_col = 'date-time')

print (df)
           domain  requests-qty  response-bytes
date-time                                      
2016-01-01     d1             0               0
2016-01-02     d2             0               1
2016-01-03     d3             1               0

print (df.index)
DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03'], 
              dtype='datetime64[ns]', name='date-time', freq=None)

df1 = df.set_index(['domain'], append = True)
print (df1)
                   requests-qty  response-bytes
date-time  domain                              
2016-01-01 d1                 0               0
2016-01-02 d2                 0               1
2016-01-03 d3                 1               0

print (df1.index)
MultiIndex([('2016-01-01', 'd1'),
            ('2016-01-02', 'd2'),
            ('2016-01-03', 'd3')],
           names=['date-time', 'domain'])

python KeyError：“日期时间”

1 个答案: