Pandas向量化方法允许在一行中执行很多操作,从而导致行长超出正常水平。如何将PEP准则与熊猫长行调和?
PEP建议python的最大行长度不应大于72。
熊猫行可以像这样:
df['VALUE_EXPRESSED'] = np.where((df['TEST_HOSPITAL_CONCEPT_NAME_CLEAN']=='EO AUTOMATED ABS') & (df['UNIT_AS_EXPECTED']=='cells/mcl'),df['VALUE_EXPRESSED']*1000,df['VALUE_EXPRESSED'] )
或
query = df.groupby(['TEST_HOSPITAL_CONCEPT_NAME_CLEAN', 'UNIT_AS_EXPECTED_TRANSFORMED', 'NUMERATOR','DENOMINATOR']).size().reset_index(name='COUNT')
我无法修改标头名称,我认为使用变量来缩短名称会降低代码的显式性和可读性。
答案 0 :(得分:6)
您指的是method chaining。
有几种分解方法:
\
进行无括号的行连续示例:
query = (df
.groupby(
[
'TEST_HOSPITAL_CONCEPT_NAME_CLEAN',
'UNIT_AS_EXPECTED_TRANSFORMED',
'NUMERATOR',
'DENOMINATOR'
]
)
.size()
.reset_index(name='COUNT')
)
答案 1 :(得分:2)
还考虑将很长的子表达式放入中间变量。例如,您可以重写行:
df['VALUE_EXPRESSED'] = np.where((df['TEST_HOSPITAL_CONCEPT_NAME_CLEAN']=='EO AUTOMATED ABS') & (df['UNIT_AS_EXPECTED']=='cells/mcl'),df['VALUE_EXPRESSED']*1000,df['VALUE_EXPRESSED'] )
为:
cond = (
(df['TEST_HOSPITAL_CONCEPT_NAME_CLEAN'] == 'EO AUTOMATED ABS') &
(df['UNIT_AS_EXPECTED'] == 'cells/mcl')
)
df['VALUE_EXPRESSED'] = np.where(
cond,
df['VALUE_EXPRESSED'] * 1000,
df['VALUE_EXPRESSED'],
)