我有一个连续变量,我试图在这个变量上创建10个bin。此变量的零值超过50%,因此我使用以下代码创建大小不均的容器:
import pandas as pd
import numpy as np
import pandas.core.algorithms as algos
from pandas import Series
bins = algos.quantile(np.unique(df['highlight']), np.linspace(0, 1, 11))
result = pd.tools.tile._bins_to_cuts(df['highlight'], bins, include_lowest=True)
result.value_counts()
[0, 78.3] 2152235
(78.3, 156.6] 93257
(156.6, 234.9] 37539
(234.9, 313.2] 17740
(313.2, 391.5] 11781
(391.5, 478.8] 8334
(478.8, 577.2] 7503
(577.2, 711.4] 6216
(711.4, 890.4] 6184
(890.4, 4972] 5539
Name: highlight, dtype: int64
正如我们所看到的,我创建了不同的bin,现在我只想为每个bin分配一个数字,从而为变量中的值赋值。我最终希望将值替换为同一变量中的bin号。
例如:
值38将获得bin#1
值97将获得bin#2
依旧......
我该怎么做?
答案 0 :(得分:0)
result['assigned'] = np.where(result = '[0, 78.3]', 1, 0)
result['assigned'] = np.where(result = '(78.3, 156.6]', 2, result['assigned'])
result['assigned'] = np.where(result = '(156.6, 234.9]', 3, result['assigned'])
...