我有用于填充观察矩阵的箱子和数据:
a = array([0., 14., 29., 43., 58., 72., 86., 101., 115., 130., 144.])
b = array([10, 26, 36, 48, 64, 71, 91, 105, 123, 133, 141])
我期待的结果:
0-13 14-28 29-42 43-57 58-71 72-85 86-100 101-114 115-129 130-144
10 1 0 0 0 0 0 0 0 0 0
26 0 1 0 0 0 0 0 0 0 0
36 0 0 1 0 0 0 0 0 0 0
48 0 0 0 1 0 0 0 0 0 0
64 0 0 0 0 1 0 0 0 0 0
71 0 0 0 0 1 0 0 0 0 0
91 0 0 0 0 0 0 1 0 0 0
答案 0 :(得分:2)
这是一种方式:
import numpy as np
import pandas as pd
a = np.array([0., 14., 29., 43., 58., 72., 86., 101., 115., 130., 144.])
b = np.array([10, 26, 36, 48, 64, 71, 91, 105, 123, 133, 141])
df = pd.DataFrame({'Values': b})
df['Range'] = pd.cut(df['Values'], a)
dummies = pd.get_dummies(df['Range'])
res = pd.concat([df, dummies], axis=1)
print(res)
<强>解释强>
pandas.cut
使用与范围相关的默认标签(如果没有提供)。pandas.get_dummies
将系列扩展为“一热编码”格式。pandas.concat
允许您将原始数据框加入get_dummies
的输出。Values
将您的res = res.set_index('Values')
设为索引。<强>结果强>
print(res)
Values Range (0, 14] (14, 29] (29, 43] (43, 58] (58, 72] \
0 10 (0, 14] 1 0 0 0 0
1 26 (14, 29] 0 1 0 0 0
2 36 (29, 43] 0 0 1 0 0
3 48 (43, 58] 0 0 0 1 0
4 64 (58, 72] 0 0 0 0 1
5 71 (58, 72] 0 0 0 0 1
6 91 (86, 101] 0 0 0 0 0
7 105 (101, 115] 0 0 0 0 0
8 123 (115, 130] 0 0 0 0 0
9 133 (130, 144] 0 0 0 0 0
10 141 (130, 144] 0 0 0 0 0
(72, 86] (86, 101] (101, 115] (115, 130] (130, 144]
0 0 0 0 0 0
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
5 0 0 0 0 0
6 0 1 0 0 0
7 0 0 1 0 0
8 0 0 0 1 0
9 0 0 0 0 1
10 0 0 0 0 1
答案 1 :(得分:2)
将get_dummies
与cut
一起使用,最后为b
数组添加set_index
索引:
labels = ['{}-{}'.format(i, j - 1) for i, j in zip(a[:-1].astype(int), a[1:].astype(int))]
d = pd.get_dummies((pd.cut(b, a, labels=labels))).set_index(b)
print (d)
0-13 14-28 29-42 43-57 58-71 72-85 86-100 101-114 115-129 \
10 1 0 0 0 0 0 0 0 0
26 0 1 0 0 0 0 0 0 0
36 0 0 1 0 0 0 0 0 0
48 0 0 0 1 0 0 0 0 0
64 0 0 0 0 1 0 0 0 0
71 0 0 0 0 1 0 0 0 0
91 0 0 0 0 0 0 1 0 0
105 0 0 0 0 0 0 0 1 0
123 0 0 0 0 0 0 0 0 1
133 0 0 0 0 0 0 0 0 0
141 0 0 0 0 0 0 0 0 0
130-143
10 0
26 0
36 0
48 0
64 0
71 0
91 0
105 0
123 0
133 1
141 1
如果想要最后一次标记更改为144
,这里是解决方案:
a1 = a[:-1].astype(int)
a2 = a[1:].astype(int)
a2[-1] += 1
labels = ['{}-{}'.format(i, j - 1) for i, j in zip(a1, a2)]
d = pd.get_dummies((pd.cut(b, a, labels=labels))).set_index(b)
print (d)
0-13 14-28 29-42 43-57 58-71 72-85 86-100 101-114 115-129 \
10 1 0 0 0 0 0 0 0 0
26 0 1 0 0 0 0 0 0 0
36 0 0 1 0 0 0 0 0 0
48 0 0 0 1 0 0 0 0 0
64 0 0 0 0 1 0 0 0 0
71 0 0 0 0 1 0 0 0 0
91 0 0 0 0 0 0 1 0 0
105 0 0 0 0 0 0 0 1 0
123 0 0 0 0 0 0 0 0 1
133 0 0 0 0 0 0 0 0 0
141 0 0 0 0 0 0 0 0 0
130-144
10 0
26 0
36 0
48 0
64 0
71 0
91 0
105 0
123 0
133 1
141 1