熊猫vs PEP样式指南

时间:2019-05-06 23:46:24

标签: python pandas

Pandas向量化方法允许在一行中执行很多操作,从而导致行长超出正常水平。如何将PEP准则与熊猫长行调和?

PEP建议python的最大行长度不应大于72。

熊猫行可以像这样:

df['VALUE_EXPRESSED'] = np.where((df['TEST_HOSPITAL_CONCEPT_NAME_CLEAN']=='EO AUTOMATED ABS') & (df['UNIT_AS_EXPECTED']=='cells/mcl'),df['VALUE_EXPRESSED']*1000,df['VALUE_EXPRESSED'] )

query = df.groupby(['TEST_HOSPITAL_CONCEPT_NAME_CLEAN', 'UNIT_AS_EXPECTED_TRANSFORMED', 'NUMERATOR','DENOMINATOR']).size().reset_index(name='COUNT')

我无法修改标头名称,我认为使用变量来缩短名称会降低代码的显式性和可读性。

2 个答案:

答案 0 :(得分:6)

您指的是method chaining

有几种分解方法:

  • 将整个表达式放在括号中(如下所示)
  • 使用\进行无括号的行连续

示例:

query = (df
    .groupby(
        [
            'TEST_HOSPITAL_CONCEPT_NAME_CLEAN',
            'UNIT_AS_EXPECTED_TRANSFORMED',
            'NUMERATOR',
            'DENOMINATOR'
        ]
    )
    .size()
    .reset_index(name='COUNT')
)

答案 1 :(得分:2)

还考虑将很长的子表达式放入中间变量。例如,您可以重写行:

df['VALUE_EXPRESSED'] = np.where((df['TEST_HOSPITAL_CONCEPT_NAME_CLEAN']=='EO AUTOMATED ABS') & (df['UNIT_AS_EXPECTED']=='cells/mcl'),df['VALUE_EXPRESSED']*1000,df['VALUE_EXPRESSED'] )

为:

cond = (
    (df['TEST_HOSPITAL_CONCEPT_NAME_CLEAN'] == 'EO AUTOMATED ABS') &
    (df['UNIT_AS_EXPECTED'] == 'cells/mcl')
)
df['VALUE_EXPRESSED'] = np.where(
    cond,
    df['VALUE_EXPRESSED'] * 1000,
    df['VALUE_EXPRESSED'],
)
相关问题