我有一个装有熊猫的数据框pd1
pd1 = pd.read_csv(r'c:\am\wiki_stats\topandas.txt',sep=':',
header=None, names = ['date-time','domain','requests-qty','response-bytes'],
parse_dates=[1], converters={'date-time': to_datetime}, index_col = 'date-time')
带有索引
>> pd1.index:
DatetimeIndex(['2016-01-01 00:00:00', '2016-01-01 00:00:00',
'2016-01-01 00:00:00', '2016-01-01 00:00:00',
'2016-01-01 00:00:00', '2016-01-01 00:00:00',
'2016-01-01 00:00:00', '2016-01-01 00:00:00',
'2016-01-01 00:00:00', '2016-01-01 00:00:00',
...
'2016-08-05 12:00:00', '2016-08-05 12:00:00',
'2016-08-05 12:00:00', '2016-08-05 12:00:00',
'2016-08-05 12:00:00', '2016-08-05 12:00:00',
'2016-08-05 12:00:00', '2016-08-05 12:00:00',
'2016-08-05 12:00:00', '2016-08-05 12:00:00'],
dtype='datetime64[ns]', name='date-time', length=6084158, freq=None)
但是当我想将索引设置为该列时,出现如下错误(我最初想设置多列索引,该错误出现了,然后尝试从pd_new_index = pd1.set_index(['requests-qty','domain'])
创建其他数据框,并将其他列设置为索引(确定)并创建新帧,同时还将索引设置为“日期时间”列后退pd_new_2 = pd_new_index.set_index(['date-time'])
-同样的错误)。 “日期时间”看起来不像特殊关键字,而且该列现在是索引。为什么会出错?
KeyError跟踪(最近的呼叫 持续) C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ core \ indexes \ base.py 在get_loc(self,key,method,tolerance)2656中尝试: -> 2657返回self._engine.get_loc(key)2658,除了KeyError:
pandas._libs.index.IndexEngine.get_loc()中的pandas / _libs / index.pyx
pandas._libs.index.IndexEngine.get_loc()中的pandas / _libs / index.pyx
pandas / _libs / hashtable_class_helper.pxi在 pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas / _libs / hashtable_class_helper.pxi在 pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError:“日期时间”
在处理上述异常期间,发生了另一个异常:
KeyError跟踪(最近的呼叫 最后) ----> 1个pd_new_2 = pd_new_index.set_index(['date-time'])
C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ core \ frame.py在 set_index(self,keys,drop,append,inplace,verify_integrity)4176 names.append(None)4177其他: -> 4178级=框架[col]。值4179个名称。如果掉落,则追加(col)4180:
C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ core \ frame.py在 getitem (自身,密钥)2925,如果self.columns.nlevels> 1:2926返回self._getitem_multilevel(key) -> 2927 indexer = self.columns.get_loc(key)2928 if is_integer(indexer):2929 indexer = [indexer]
C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ core \ indexes \ base.py 在get_loc(自身,键,方法,公差)中2657
pandas._libs.index.IndexEngine.get_loc()中的
返回self._engine.get_loc(key)2658,但KeyError除外: -> 2659返回self._engine.get_loc(self._maybe_cast_indexer(key))2660
索引器= self.get_indexer([键],方法=方法,公差=公差) 2661如果indexer.ndim> 1或indexer.size> 1:pandas / _libs / index.pyx
pandas._libs.index.IndexEngine.get_loc()中的pandas / _libs / index.pyx
pandas / _libs / hashtable_class_helper.pxi在 pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas / _libs / hashtable_class_helper.pxi在 pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError:“日期时间”
答案 0 :(得分:1)
原因是date-time
已经是索引,在这里DatetimeIndex
,因此无法像按名称的列那样选择它。
原因是参数index_col
:
pd1 = pd.read_csv(r'c:\am\wiki_stats\topandas.txt',
sep=':',
header=None,
names = ['date-time','domain','requests-qty','response-bytes'],
parse_dates=[1],
converters={'date-time': to_datetime},
index_col = 'date-time')
对于MultiIndex,在index_col
中添加列名称列表,删除converters
并在parse_dates
参数中指定列名称:
import pandas as pd
from io import StringIO
temp=u"""2016-01-01:d1:0:0
2016-01-02:d2:0:1
2016-01-03:d3:1:0"""
#after testing replace 'pd.compat.StringIO(temp)' to r'c:\am\wiki_stats\topandas.txt''
df = pd.read_csv(StringIO(temp),
sep=':',
header=None,
names = ['date-time','domain','requests-qty','response-bytes'],
parse_dates=['date-time'],
index_col = ['date-time','domain'])
print (df)
date-time domain
2016-01-01 d1 0 0
2016-01-02 d2 0 1
2016-01-03 d3 1 0
print (df.index)
MultiIndex([('2016-01-01', 'd1'),
('2016-01-02', 'd2'),
('2016-01-03', 'd3')],
names=['date-time', 'domain'])
EDIT1:在append
中使用set_index
参数的解决方案:
import pandas as pd
from io import StringIO
temp=u"""2016-01-01:d1:0:0
2016-01-02:d2:0:1
2016-01-03:d3:1:0"""
#after testing replace 'pd.compat.StringIO(temp)' to r'c:\am\wiki_stats\topandas.txt''
df = pd.read_csv(StringIO(temp),
sep=':',
header=None,
names = ['date-time','domain','requests-qty','response-bytes'],
parse_dates=['date-time'],
index_col = 'date-time')
print (df)
domain requests-qty response-bytes
date-time
2016-01-01 d1 0 0
2016-01-02 d2 0 1
2016-01-03 d3 1 0
print (df.index)
DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03'],
dtype='datetime64[ns]', name='date-time', freq=None)
df1 = df.set_index(['domain'], append = True)
print (df1)
requests-qty response-bytes
date-time domain
2016-01-01 d1 0 0
2016-01-02 d2 0 1
2016-01-03 d3 1 0
print (df1.index)
MultiIndex([('2016-01-01', 'd1'),
('2016-01-02', 'd2'),
('2016-01-03', 'd3')],
names=['date-time', 'domain'])