迭代pandas数据集时创建新列(多个条件)

时间:2017-09-22 22:07:35

标签: python-3.x pandas

我正在尝试创建一个新列,在数据类型为float的列上应用多个条件。

Sample data:
ID  CO
0        12.0
1        11.0
2         8.0
3         6.5
4         5.5
5         5.7
6         5.8
7         6.5
8         6.8

for index, row in df.iterrows():
    if row['CO'] in arange(0,1.54):
        row.loc['CO_1'] = 'GOOD'
    elif row['CO'] in arange(1.54,1.70):
        row.loc['CO_1'] = 'MOD'

上面没有用,所以我试着单独写一个函数:

def aqi_CO(row):
    val_1=0
    for x in row:
        if x in arange(0,0.054):
            val_1 = 'GOOD'
        elif x in arange(0.054,0.070):
            val_1 = 'MODERATE'
        elif x in arange(0.070,0.085):
            val_1 = 'UNHEALTHY_SG'
        elif x in arange(0.085,0.105):
            val_1 = 'UNHEALTHY'
        elif x in arange(0.105,0.200):
            val_1 = 'VERY_UNHEALTHY'
        elif x in arange(0.200,3):
            val_1 = 'HAZARDOUS'
        return val_1

并通过apply:

调用它
df['aqi_CO'] = df.apply(lambda x: aqi_CO(df['CO']), axis=1)

这并没有奏效。我现在很困惑,有人可以帮助我如何逐行添加新列迭代数据帧并检查3,4条件以创建新列。

2 个答案:

答案 0 :(得分:1)

使用pd.cut

pd.cut(df.CO,bins=[0,2,4,6,8,9,100],labels=["GOOD","MODERATE","UNHEALTHY_SG","UNHEALTHY","VERY_UNHEALTHY","HAZARDOUS"])

Out[866]: 
0       HAZARDOUS
1       HAZARDOUS
2       UNHEALTHY
3       UNHEALTHY
4    UNHEALTHY_SG
5    UNHEALTHY_SG
6    UNHEALTHY_SG
7       UNHEALTHY
8       UNHEALTHY
Name: CO, dtype: category

df['new']=pd.cut(df.CO,bins=[0,2,4,6,8,9,100],labels=["GOOD","MODERATE","UNHEALTHY_SG","UNHEALTHY","VERY_UNHEALTHY","HAZARDOUS"])
df
Out[868]: 
   ID    CO           new
0   0  12.0     HAZARDOUS
1   1  11.0     HAZARDOUS
2   2   8.0     UNHEALTHY
3   3   6.5     UNHEALTHY
4   4   5.5  UNHEALTHY_SG
5   5   5.7  UNHEALTHY_SG
6   6   5.8  UNHEALTHY_SG
7   7   6.5     UNHEALTHY
8   8   6.8     UNHEALTHY

答案 1 :(得分:0)

在您的第一段代码中: json_encode()返回arange(0,1.54),样本数据中没有任何内容。但是,如果你愿意的话 然后检查,你可以增加范围和步长。 对于array([ 0., 1.])之类的内容,对于for循环中的下一步,您使用arange(0, 7, 0.1).locindex代替dataframerow而不是df.loc[index,'CO_1'] = 'GOOD'

row.loc['CO_1'] = 'GOOD'

结果:

for index, row in df.iterrows():
    if row['CO'] in arange(0, 7, 0.1):
        df.loc[index,'CO_1'] = 'GOOD'
    elif row['CO'] in arange(1.54,1.70):
        df.loc[index,'CO_1'] = 'MOD'

同样,对于代码的第二个片段,可能正在使用lambda并仅应用于列:

     CO  CO_1
ID            
0   12.0   NaN
1   11.0   NaN
2    8.0   NaN
3    6.5  GOOD
4    5.5  GOOD
5    5.7  GOOD
6    5.8   NaN
7    6.5  GOOD
8    6.8   NaN

现在,由于只传递了列值,因此可以在函数中不进行迭代检查(注意:第一种情况的函数范围已更改,因此可以看到该输出):

df['aqi_CO'] = df['CO'].apply(lambda x: aqi_CO(x))

结果:

def aqi_CO(x):
    val_1=0

    if x in arange(0,7, 0.1):
        val_1 = 'GOOD'
    elif x in arange(0.054,0.070):
        val_1 = 'MODERATE'
    elif x in arange(0.070,0.085):
        val_1 = 'UNHEALTHY_SG'
    elif x in arange(0.085,0.105):
        val_1 = 'UNHEALTHY'
    elif x in arange(0.105,0.200):
        val_1 = 'VERY_UNHEALTHY'
    elif x in arange(0.200,3):
        val_1 = 'HAZARDOUS'
    return val_1