如何在多个条件下在python中快速装箱

时间:2019-02-02 01:24:03

标签: python pandas

我想根据自己定义的不同范围对数字进行分类。

lambda很简单,但是如果条件大于2,该怎么办?我曾经使用过,但是它什么都不会改变

country = pd.DataFrame({'COUNTRY':['China','JAPAN','KOREA', 'USA', 'UK'],
               'POPULATION':[1200,2345,3400,5600,9600],
               'ECONOMY':[86212,11862,1000, 8555,12000]})

for x in country.POPULATION:
if x < 2000:
    x = 'small'
elif x >2000 and x <=4000:
    x='medium'
elif x > 5000 and x <=6000:
    x='big'
else:
    'huge'

我希望数据可以根据范围返回“小”,“中”等。

2 个答案:

答案 0 :(得分:1)

我将在多个条件下使用np.select

conditions = [
    country['POPULATION'] < 2000,
    ((country['POPULATION'] > 2000) & (country['POPULATION'] <= 4000)),
    ((country['POPULATION'] > 5000) & (country['POPULATION'] <=6000))
]

choices = [
    'small',
    'medium',
    'big'
]

# create a new column or assign it to an existing
# the last param in np.select is default
country['new'] = np.select(conditions, choices, 'huge')

  COUNTRY  POPULATION  ECONOMY     new
0   China        1200    86212   small
1   JAPAN        2345    11862  medium
2   KOREA        3400     1000  medium
3     USA        5600     8555     big
4      UK        9600    12000    huge

答案 1 :(得分:0)

来自@Chris的

np.select看起来不错,但是我为pd.cutsee docs)写了一个答案,所以我不妨发布它:

import pandas as pd
df = pd.DataFrame({'COUNTRY':['China','JAPAN','KOREA', 'USA', 'UK'],
               'POPULATION':[1200,2345,3400,5600,9600],
               'ECONOMY':[86212,11862,1000, 8555,12000]})

df["size"] = pd.cut(df["POPULATION"],
                bins=[0, 2000, 4000, 5000, 6000, df.POPULATION.max()],
                labels=["Small", "Medium", "NaN", "Large", "Huge"])

这有点时髦,因为您可以通过写入任意标签来处理4到5千之间的差距(在这种情况下,我写了“ NaN”,但这是错误的)