根据条件创建垃圾箱

时间:2020-10-23 22:24:06

标签: python pandas bins

我的原始数据集看起来像下面的示例:

| id | old_a | new_a | old_b | new_b | ratio_a  | ratio_b |
|----|-------|-------|-------|-------|----------|---------|
| 1  | 350   | 6     | 35    | 0     | 58.33333 | Inf     |
| 2  | 164   | 79    | 6     | 2     | 2.075949 | 3       |
| 3  | 10    | 0     | 1     | 1     | Inf      | 1       |
| 4  | 120   | 1     | 10    | 0     | 120      | Inf     |

以下是数据框:

df=[[1,350,6,35,0],[2,164,79,6,2],[3,10,0,1,1],[4,120,1,10,0]]
df= pd.DataFrame(df,columns=['id','old_a','new_a','old_b','new_b'])

我已使用以下代码获得了列“ ratio_a”和“ ratio_b”(如表所示):

df['ratio_a']= df['old_a']/df['new_a']
df['ratio_b']= df['old_b']/df['new_b']

接下来,我想再创建两个数字范围的列,其中ratio_a和ratio_b的值将落入其中。为此,我编写了以下代码:

bins = [0,10,20,30,40,50,60,70,80,90,100]
labels = ['{}-{}'.format(i, j) for i, j in zip(bins[:-1], bins[1:])]
df['a_range'] = pd.cut(df['ratio_a'], bins=bins, labels=labels, include_lowest=True)
df['b_range'] = pd.cut(df['ratio_b'], bins=bins, labels=labels, include_lowest=True)

我遇到的一个问题是,如果ratio_a和ratio_b中的任何值大于100,则它应属于存储桶'> 100'中。我怎样才能做到这一点? 我的最终结果应如下所示:

| id | old_a | new_a | old_b | new_b | ratio_a  | ratio_b | a_range | b_range |
|----|-------|-------|-------|-------|----------|---------|---------|---------|
| 1  | 350   | 6     | 35    | 0     | 58.33333 | Inf     | 40-50   | NaN     |
| 2  | 164   | 79    | 6     | 2     | 2.075949 | 3       | 0-10    | 0-10    |
| 3  | 10    | 0     | 1     | 1     | Inf      | 1       | NaN     | 0-10    |
| 4  | 120   | 1     | 10    | 0     | 120      | Inf     | >100    | NaN     |

1 个答案:

答案 0 :(得分:1)

一种可能的解决方案:

bins = [0,10,20,30,40,50,60,70,80,90,100,np.inf]
labels = ['{}-{}'.format(i, j) for i, j in zip(bins[:-1], bins[1:])]
labels[-1]=">100"
df['a_range'] = pd.cut(df['ratio_a'], bins=bins, labels=labels, include_lowest=True)
df['b_range'] = pd.cut(df['ratio_b'], bins=bins, labels=labels, include_lowest=True)

结果:

id  old_a  new_a  old_b  new_b     ratio_a  ratio_b a_range b_range
 1    350      6     35      0   58.333333      inf   50-60     NaN
 2    164     79      6      2    2.075949      3.0    0-10    0-10
 3     10      0      1      1         inf      1.0     NaN    0-10
 4    120      1     10      0  120.000000      inf    >100     NaN