简洁的方法基于列值更新值

时间:2015-04-10 14:14:06

标签: python pandas

背景:我有一个DataFrame,我需要使用一些非常具体的条件来更新其值。我继承的原始实现在for循环中使用了很多嵌套的if语句,模糊了正在发生的事情。主要考虑可读性,我将其重写为:

# Other Widgets 
df.loc[(
    (df.product == 0) & 
    (df.prod_type == 'OtherWidget') & 
    (df.region == 'US') 
    ), 'product'] = 5

# Supplier X - All clients
df.loc[(
    (df.product == 0) &
    (df.region.isin(['UK','US'])) &
    (df.supplier == 'X')
    ), 'product'] =  6

# Supplier Y - Client A 
df.loc[(
    (df.product == 0) & 
    (df.region.isin(['UK','US'])) &
    (df.supplier == 'Y') & 
    (df.client == 'A')
    ), 'product'] =  1        

# Supplier Y - Client B
df.loc[(
    (df.product == 0) & 
    (df.region.isin(['UK','US'])) &
    (df.supplier == 'Y') & 
    (df.client == 'B')
    ), 'product'] =  3

# Supplier Y - Client C
df.loc[(
    (df.product == 0) & 
    (df.region.isin(['UK','US'])) &
    (df.supplier == 'Y') & 
    (df.client == 'C')
    ), 'product'] =  4

问题:这很有效,并且条件清晰(在我看来),但我并不完全满意,因为它占用了大量空间。无论如何,从可读性/简洁性的角度来改善这一点?

1 个答案:

答案 0 :(得分:1)

根据EdChum的建议,我为条件创建了一个掩码。下面的代码在屏蔽方面有点过分,但它给出了一般意义。

prod_0   = ( df.product == 0 )
ptype_OW = ( df.prod_type == 'OtherWidget' )
rgn_UKUS = ( df.region.isin['UK', 'US'] )
rgn_US   = ( df.region == 'US' )
supp_X   = ( df.supplier == 'X' )
supp_Y   = ( df.supplier == 'Y' )
clnt_A   = ( df.client == 'A' )
clnt_B   = ( df.client == 'B' )
clnt_C   = ( df.client == 'C' )

df.loc[(prod_0 & ptype_OW & reg_US), 'prod_0']          = 5
df.loc[(prod_0 & rgn_UKUS & supp_X), 'prod_0']          = 6
df.loc[(prod_0 & rgn_UKUS & supp_Y & clnt_A), 'prod_0'] = 1
df.loc[(prod_0 & rgn_UKUS & supp_Y & clnt_B), 'prod_0'] = 3
df.loc[(prod_0 & rgn_UKUS & supp_Y & clnt_C), 'prod_0'] = 4