我想根据自己定义的不同范围对数字进行分类。
lambda很简单,但是如果条件大于2,该怎么办?我曾经使用过,但是它什么都不会改变
country = pd.DataFrame({'COUNTRY':['China','JAPAN','KOREA', 'USA', 'UK'],
'POPULATION':[1200,2345,3400,5600,9600],
'ECONOMY':[86212,11862,1000, 8555,12000]})
for x in country.POPULATION:
if x < 2000:
x = 'small'
elif x >2000 and x <=4000:
x='medium'
elif x > 5000 and x <=6000:
x='big'
else:
'huge'
我希望数据可以根据范围返回“小”,“中”等。
答案 0 :(得分:1)
我将在多个条件下使用np.select:
conditions = [
country['POPULATION'] < 2000,
((country['POPULATION'] > 2000) & (country['POPULATION'] <= 4000)),
((country['POPULATION'] > 5000) & (country['POPULATION'] <=6000))
]
choices = [
'small',
'medium',
'big'
]
# create a new column or assign it to an existing
# the last param in np.select is default
country['new'] = np.select(conditions, choices, 'huge')
COUNTRY POPULATION ECONOMY new
0 China 1200 86212 small
1 JAPAN 2345 11862 medium
2 KOREA 3400 1000 medium
3 USA 5600 8555 big
4 UK 9600 12000 huge
答案 1 :(得分:0)
np.select
看起来不错,但是我为pd.cut
(see docs)写了一个答案,所以我不妨发布它:
import pandas as pd
df = pd.DataFrame({'COUNTRY':['China','JAPAN','KOREA', 'USA', 'UK'],
'POPULATION':[1200,2345,3400,5600,9600],
'ECONOMY':[86212,11862,1000, 8555,12000]})
df["size"] = pd.cut(df["POPULATION"],
bins=[0, 2000, 4000, 5000, 6000, df.POPULATION.max()],
labels=["Small", "Medium", "NaN", "Large", "Huge"])
这有点时髦,因为您可以通过写入任意标签来处理4到5千之间的差距(在这种情况下,我写了“ NaN”,但这是错误的)