Question

我已经根据按时间建立索引的帧创建了一个多层次的索引：

original_thing
 time                day_1  day_2  day_3 day_4
 2018-05-24 20:00:00  0     0      1     0
 2018-05-25 00:00:00  0     0      0     1
 2018-05-25 04:00:00  0     0      0     1
 2018-05-25 08:00:00  0     0      0     1

将信息重新采样并聚合为不同的对象，并将其打包到列表中

 DF_list = [original_thing, resampled_1, resampled_2]

将pandas concat与看起来像这样的代码一起使用：

thisthing = pandas.concat(DF_list, keys=range(len(DF_list), names=['one','time'], sort=True)

获得如下所示的数据框：

one  time                   day_1    day_2    day_3    day_4
 2    2018-05-24 00:00:00    0        0        1        0
 1    2018-05-24 12:00:00    0        0        1        0
 0    2018-05-24 20:00:00    0        0        1        0
 0    2018-05-25 00:00:00    0        0        0        1
 1    2018-05-25 00:00:00    0        0        0        1
 2    2018-05-25 00:00:00    0        0        0        1
 0    2018-05-25 04:00:00    0        0        0        1
 0    2018-05-25 08:00:00    0        0        0        1

我想获取索引“ one”并获得：

one  time                   id_1  id_2  id_3 day_...    
 2    2018-05-24 00:00:00    0     0     1    0
 1    2018-05-24 12:00:00    0     1     0    0
 0    2018-05-24 20:00:00    1     0     0    0
 0    2018-05-25 00:00:00    1     0     0    1
 1    2018-05-25 00:00:00    0     1     0    1
 2    2018-05-25 00:00:00    0     0     1    1
 0    2018-05-25 04:00:00    1     0     0    1
 0    2018-05-25 08:00:00    1     0     0    1

其中id_'#'是来自“一个”的编码索引

我尝试用以下方式对其进行编码：

conc_ohlc_dummies= pandas.get_dummies(conc_ohlc['one'], prefix= 'hours')

但出现此错误：

返回self._engine.get_loc（self._maybe_cast_indexer（key））在pandas._libs.index.IndexEngine.get_loc中的文件“ pandas_libs \ index.pyx”，第140行在pandas._libs.index.IndexEngine.get_loc中的文件“ pandas_libs \ index.pyx”，第162行在pandas._libs.hashtable.PyObjectHashTable.get_item中的文件“ pandas_libs \ hashtable_class_helper.pxi”，行1492 在pandas._libs.hashtable.PyObjectHashTable.get_item中的文件“ pandas_libs \ hashtable_class_helper.pxi”，行1500 KeyError：“一个”

我还尝试过重新索引它以消除索引值。除了写入csv并重新打开以执行此操作以外，还有其他方法吗？

谢谢

Answer 1

您可以使用OneHotEncoder形式的sklearn。

让我们从一些样板代码开始：

 import pandas as pd
 import numpy as np
 from sklearn.preprocessing import OneHotEncoder
 df = pd.DataFrame({"one":[2,1,0,0,1,2], "abcd":[4,6,3,6,7,1]})
 print(df)

   one  abcd
0    2     4
1    1     6
2    0     3
3    0     6
4    1     7
5    2     1

现在您可以使用这些值使一个热编码器对象适合...

ohe = OneHotEncoder()
ohe.fit( df.one.values.reshape(-1, 1) )
vals = ohe.transform( df.one.values.reshape(-1, 1) ).toarray()
print(vals)

array([[0., 0., 1.],
       [0., 1., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

现在只需将它们插入数据框中：

for i in range(vals.shape[1]):
    df['id_{}'.format(i)] = vals[:, i]

最终数据框应如下所示：

   one  abcd  id_0  id_1  id_2
0    2     4   0.0   0.0   1.0
1    1     6   0.0   1.0   0.0
2    0     3   1.0   0.0   0.0
3    0     6   1.0   0.0   0.0
4    1     7   0.0   1.0   0.0
5    2     1   0.0   0.0   1.0

Answer 2

最初，我尝试使用.reindex（）方法删除数据帧中的所有索引，但是发现.reset_index（）有效。有了索引的方式，.get_dummies（）和.merge（）进行了编码，并为我添加了信息。我确实必须再次设置索引，然后对其进行排序以取得良好的效果：

    thisthing= thisthing.reset_index()
    thisthing_dummies= pandas.get_dummies(thisthing['one'], prefix='hours', drop_first=True)
    thisthing= thisthing.merge(thisthing_dummies, left_index=True, right_index=True)
    thisthing= thisthing.set_index(['time','one'])
    thisthing.sort_values(by=['time', 'one'],inplace=True)

如何获取我的大熊猫的一种分层索引并对其进行热编码？

2 个答案: