使用熊猫对浮子列进行分组

时间:2019-08-30 15:51:26

标签: python pandas

我需要按体重分组熊猫

Name      weight(kg)
Person1    4.44
Person2   37.3
Person3   36.38
Person4   39.52
Person5   81.57
Person6   43.55
Person7   91.11
Person8    5
Person9   36.48
Person10  38.25

我的代码如下。需要根据条件分组。我的代码如下。但是所有行的数值都为0到20。

if 0 <= data_file['weight(kg)'].all() < 20:
    data_file['target'] = "0 to 20%"
if 20 < data_file['weight(kg)'].all() < 40:
    data_file['target'] = "20 to 40%"
if 40 < data_file['weight(kg)'].all() < 60:
    data_file['target'] = "40 to 60%"
if 60 < data_file['weight(kg)'].all() < 80:
    data_file['target'] = "60 to 80%"
if 80 < data_file['weight(kg)'].all() <= 100:
    data_file['target'] = "80 to 100%"

预期

Name     weight(kg) Target
Person1  4.44       0 to 20
Person2  37.3       20 to 40
Person3  36.38      20 to 40
Person4  39.52      20 to 40
Person5  81.57      80 to 100
Person6  43.55      40 to 60
Person7  91.11      80 to 100
Person8  5           0 to 20
Person9  36.48      20 to 40
Person10 38.25      20 to 40

3 个答案:

答案 0 :(得分:6)

使用pd.cut

df.assign(bins = pd.cut(df["weight(kg)"], [0,20,40,60,80,100], 
                        labels=['0 to 20', '20 to 40', '40 to 60', '60 to 80', '80 to 100']))

结果

      Name   weight(kg) bins
0   Person1     4.44    0 to 20
1   Person2     37.30   20 to 40
2   Person3     36.38   20 to 40
3   Person4     39.52   20 to 40
4   Person5     81.57   80 to 100
5   Person6     43.55   40 to 60
6   Person7     91.11   80 to 100
7   Person8     5.00    0 to 20
8   Person9     36.48   20 to 40
9   Person10    38.25   20 to 40

答案 1 :(得分:3)

非常简单,只需尝试在熊猫和lambda函数中使用Apply:

def classify(x):
    if 0 <= x < 20:
        y = "0 to 20%"
    if 20 < x < 40:
        y = "20 to 40%"
    if 40 < x < 60:
        y = "40 to 60%"
    if 60 < x < 80:
        y = "60 to 80%"
    if 80 < x <= 100:
        y = "80 to 100%"
    return y

假设您的数据框具有两列“名称”和“重量”,我们应该输入:

df['Target'] = df['weight'].apply(lambda x: classify(x))

我希望对您有帮助

额外: 如果需要进度条,可以添加以下行:

from tqdm import tqdm
tqdm.pandas()
df['Target'] = df['weight'].progress_apply(lambda x: classify(x))

答案 2 :(得分:3)

您可以使用np.select

conditions = [
        (0 <= df['weight(kg)']) & (df['weight(kg)'] < 20)
     ,  (20 < df['weight(kg)']) & (df['weight(kg)'] < 40)
     ,  (40 < df['weight(kg)']) & (df['weight(kg)'] < 60)
     ,  (60 < df['weight(kg)']) & (df['weight(kg)'] < 80)
     ,  (80 < df['weight(kg)']) & (df['weight(kg)'] <= 100)
]

results = [
    "0 to 20%"
    ,"20 to 40%"
    ,"40 to 60%"
    ,"60 to 80%"
    ,"80 to 100%"
]

df['Target'] = np.select(conditions, results)

输出:

    Name    weight(kg)  Target
0   Person1 4.44        0 to 20%
1   Person2 37.30       20 to 40%
2   Person3 36.38       20 to 40%
3   Person4 39.52       20 to 40%
4   Person5 81.57       80 to 100%