按间隔创建每个单元格上带有标签的矩阵

时间:2018-06-12 08:16:34

标签: python python-3.x pandas numpy

我有用于填充观察矩阵的箱子和数据:

a = array([0.,  14.,  29.,  43.,  58.,  72.,  86., 101., 115., 130., 144.])
b = array([10, 26, 36, 48, 64, 71, 91, 105, 123, 133, 141])

我期待的结果:

   0-13 14-28 29-42 43-57 58-71 72-85 86-100 101-114 115-129 130-144
10  1     0     0     0     0     0     0       0       0       0    
26  0     1     0     0     0     0     0       0       0       0 
36  0     0     1     0     0     0     0       0       0       0 
48  0     0     0     1     0     0     0       0       0       0 
64  0     0     0     0     1     0     0       0       0       0 
71  0     0     0     0     1     0     0       0       0       0 
91  0     0     0     0     0     0     1       0       0       0 

2 个答案:

答案 0 :(得分:2)

cut + get_dummies

这是一种方式:

import numpy as np
import pandas as pd

a = np.array([0.,  14.,  29.,  43.,  58.,  72.,  86., 101., 115., 130., 144.])
b = np.array([10, 26, 36, 48, 64, 71, 91, 105, 123, 133, 141])

df = pd.DataFrame({'Values': b})

df['Range'] = pd.cut(df['Values'], a)

dummies = pd.get_dummies(df['Range'])

res = pd.concat([df, dummies], axis=1)

print(res)

<强>解释

  • pandas.cut使用与范围相关的默认标签(如果没有提供)。
  • pandas.get_dummies将系列扩展为“一热编码”格式。
  • pandas.concat允许您将原始数据框加入get_dummies的输出。
  • 您可以选择通过Values将您的res = res.set_index('Values')设为索引。

<强>结果

print(res)

    Values       Range  (0, 14]  (14, 29]  (29, 43]  (43, 58]  (58, 72]  \
0       10     (0, 14]        1         0         0         0         0   
1       26    (14, 29]        0         1         0         0         0   
2       36    (29, 43]        0         0         1         0         0   
3       48    (43, 58]        0         0         0         1         0   
4       64    (58, 72]        0         0         0         0         1   
5       71    (58, 72]        0         0         0         0         1   
6       91   (86, 101]        0         0         0         0         0   
7      105  (101, 115]        0         0         0         0         0   
8      123  (115, 130]        0         0         0         0         0   
9      133  (130, 144]        0         0         0         0         0   
10     141  (130, 144]        0         0         0         0         0   

    (72, 86]  (86, 101]  (101, 115]  (115, 130]  (130, 144]  
0          0          0           0           0           0  
1          0          0           0           0           0  
2          0          0           0           0           0  
3          0          0           0           0           0  
4          0          0           0           0           0  
5          0          0           0           0           0  
6          0          1           0           0           0  
7          0          0           1           0           0  
8          0          0           0           1           0  
9          0          0           0           0           1  
10         0          0           0           0           1  

答案 1 :(得分:2)

get_dummiescut一起使用,最后为b数组添加set_index索引:

labels = ['{}-{}'.format(i, j - 1) for i, j in zip(a[:-1].astype(int), a[1:].astype(int))] 
d = pd.get_dummies((pd.cut(b, a, labels=labels))).set_index(b)
print (d)
     0-13  14-28  29-42  43-57  58-71  72-85  86-100  101-114  115-129  \
10      1      0      0      0      0      0       0        0        0   
26      0      1      0      0      0      0       0        0        0   
36      0      0      1      0      0      0       0        0        0   
48      0      0      0      1      0      0       0        0        0   
64      0      0      0      0      1      0       0        0        0   
71      0      0      0      0      1      0       0        0        0   
91      0      0      0      0      0      0       1        0        0   
105     0      0      0      0      0      0       0        1        0   
123     0      0      0      0      0      0       0        0        1   
133     0      0      0      0      0      0       0        0        0   
141     0      0      0      0      0      0       0        0        0   

     130-143  
10         0  
26         0  
36         0  
48         0  
64         0  
71         0  
91         0  
105        0  
123        0  
133        1  
141        1  

如果想要最后一次标记更改为144,这里是解决方案:

a1 = a[:-1].astype(int)
a2 = a[1:].astype(int)
a2[-1] += 1
labels = ['{}-{}'.format(i, j - 1) for i, j in zip(a1, a2)] 
d = pd.get_dummies((pd.cut(b, a, labels=labels))).set_index(b)
print (d)
     0-13  14-28  29-42  43-57  58-71  72-85  86-100  101-114  115-129  \
10      1      0      0      0      0      0       0        0        0   
26      0      1      0      0      0      0       0        0        0   
36      0      0      1      0      0      0       0        0        0   
48      0      0      0      1      0      0       0        0        0   
64      0      0      0      0      1      0       0        0        0   
71      0      0      0      0      1      0       0        0        0   
91      0      0      0      0      0      0       1        0        0   
105     0      0      0      0      0      0       0        1        0   
123     0      0      0      0      0      0       0        0        1   
133     0      0      0      0      0      0       0        0        0   
141     0      0      0      0      0      0       0        0        0   

     130-144  
10         0  
26         0  
36         0  
48         0  
64         0  
71         0  
91         0  
105        0  
123        0  
133        1  
141        1