Question

我想根据自己定义的不同范围对数字进行分类。

lambda很简单，但是如果条件大于2，该怎么办？我曾经使用过，但是它什么都不会改变

country = pd.DataFrame({'COUNTRY':['China','JAPAN','KOREA', 'USA', 'UK'],
               'POPULATION':[1200,2345,3400,5600,9600],
               'ECONOMY':[86212,11862,1000, 8555,12000]})

for x in country.POPULATION:
if x < 2000:
    x = 'small'
elif x >2000 and x <=4000:
    x='medium'
elif x > 5000 and x <=6000:
    x='big'
else:
    'huge'

我希望数据可以根据范围返回“小”，“中”等。

Answer 1

我将在多个条件下使用np.select：

conditions = [
    country['POPULATION'] < 2000,
    ((country['POPULATION'] > 2000) & (country['POPULATION'] <= 4000)),
    ((country['POPULATION'] > 5000) & (country['POPULATION'] <=6000))
]

choices = [
    'small',
    'medium',
    'big'
]

# create a new column or assign it to an existing
# the last param in np.select is default
country['new'] = np.select(conditions, choices, 'huge')

  COUNTRY  POPULATION  ECONOMY     new
0   China        1200    86212   small
1   JAPAN        2345    11862  medium
2   KOREA        3400     1000  medium
3     USA        5600     8555     big
4      UK        9600    12000    huge

Answer 2

来自@Chris的

np.select看起来不错，但是我为pd.cut（see docs）写了一个答案，所以我不妨发布它：

import pandas as pd
df = pd.DataFrame({'COUNTRY':['China','JAPAN','KOREA', 'USA', 'UK'],
               'POPULATION':[1200,2345,3400,5600,9600],
               'ECONOMY':[86212,11862,1000, 8555,12000]})

df["size"] = pd.cut(df["POPULATION"],
                bins=[0, 2000, 4000, 5000, 6000, df.POPULATION.max()],
                labels=["Small", "Medium", "NaN", "Large", "Huge"])

这有点时髦，因为您可以通过写入任意标签来处理4到5千之间的差距（在这种情况下，我写了“ NaN”，但这是错误的）

如何在多个条件下在python中快速装箱

2 个答案: