Question

我的原始数据集看起来像下面的示例：

| id | old_a | new_a | old_b | new_b | ratio_a  | ratio_b |
|----|-------|-------|-------|-------|----------|---------|
| 1  | 350   | 6     | 35    | 0     | 58.33333 | Inf     |
| 2  | 164   | 79    | 6     | 2     | 2.075949 | 3       |
| 3  | 10    | 0     | 1     | 1     | Inf      | 1       |
| 4  | 120   | 1     | 10    | 0     | 120      | Inf     |

以下是数据框：

df=[[1,350,6,35,0],[2,164,79,6,2],[3,10,0,1,1],[4,120,1,10,0]]
df= pd.DataFrame(df,columns=['id','old_a','new_a','old_b','new_b'])

我已使用以下代码获得了列“ ratio_a”和“ ratio_b”（如表所示）：

df['ratio_a']= df['old_a']/df['new_a']
df['ratio_b']= df['old_b']/df['new_b']

接下来，我想再创建两个数字范围的列，其中ratio_a和ratio_b的值将落入其中。为此，我编写了以下代码：

bins = [0,10,20,30,40,50,60,70,80,90,100]
labels = ['{}-{}'.format(i, j) for i, j in zip(bins[:-1], bins[1:])]
df['a_range'] = pd.cut(df['ratio_a'], bins=bins, labels=labels, include_lowest=True)
df['b_range'] = pd.cut(df['ratio_b'], bins=bins, labels=labels, include_lowest=True)

我遇到的一个问题是，如果ratio_a和ratio_b中的任何值大于100，则它应属于存储桶'> 100'中。我怎样才能做到这一点？我的最终结果应如下所示：

| id | old_a | new_a | old_b | new_b | ratio_a  | ratio_b | a_range | b_range |
|----|-------|-------|-------|-------|----------|---------|---------|---------|
| 1  | 350   | 6     | 35    | 0     | 58.33333 | Inf     | 40-50   | NaN     |
| 2  | 164   | 79    | 6     | 2     | 2.075949 | 3       | 0-10    | 0-10    |
| 3  | 10    | 0     | 1     | 1     | Inf      | 1       | NaN     | 0-10    |
| 4  | 120   | 1     | 10    | 0     | 120      | Inf     | >100    | NaN     |

Answer 1

一种可能的解决方案：

bins = [0,10,20,30,40,50,60,70,80,90,100,np.inf]
labels = ['{}-{}'.format(i, j) for i, j in zip(bins[:-1], bins[1:])]
labels[-1]=">100"
df['a_range'] = pd.cut(df['ratio_a'], bins=bins, labels=labels, include_lowest=True)
df['b_range'] = pd.cut(df['ratio_b'], bins=bins, labels=labels, include_lowest=True)

结果：

id  old_a  new_a  old_b  new_b     ratio_a  ratio_b a_range b_range
 1    350      6     35      0   58.333333      inf   50-60     NaN
 2    164     79      6      2    2.075949      3.0    0-10    0-10
 3     10      0      1      1         inf      1.0     NaN    0-10
 4    120      1     10      0  120.000000      inf    >100     NaN

根据条件创建垃圾箱

1 个答案: