Python Pandas df.ix没有按预期执行

时间:2016-09-06 22:11:08

标签: python pandas testing

这是我的功能:

 def clean_zipcodes(df):
    df.ix[df['workCountryCode'].str.contains('USA') & \
    df['workZipcode'].astype(str).str.len() > 5, 'workZipcode'] = \
    df['workZipcode'].astype(int).floordiv(10000)


df.ix[df['contractCountryCode'].str.contains('USA') & \
    df['contractZipcode'].astype(str).str.len() > 5, 'contractZipcode'] = \
    df['contractZipcode'].astype(int).floordiv(10000)

return df

这是我期望的测试功能:

def test_clean_zipcodes():
testDf = pandas.DataFrame({'unique_transaction_id'  : ['1', '1', '1'],
                           'workZipcode'            : [838431000, 991631000, 99163],
                           'contractZipcode'        : [838431000, 991631000, 99163],
                           'workCountryCode'        : ['USA: STUFF', 'NONE: STUFF', 'USA: STUFF'],
                           'contractCountryCode'    : ['USA: STUFF', 'NONE: STUFF', 'USA: STUFF']})

resultDf = pandas.DataFrame({'unique_transaction_id'    : ['1', '1', '1'],
                              'workZipcode'             : [83843, 991631000, 99163],
                              'contractZipcode'         : [83843, 991631000, 99163],
                              'workCountryCode'         : ['USA: STUFF', 'NONE: STUFF', 'USA: STUFF'],
                              'contractCountryCode'     : ['USA: STUFF', 'NONE: STUFF', 'USA: STUFF']})

assert resultDf.equals(clean_zipcodes(testDf))

除了缩进不正确(没有转换为SO格式)之外,df.ix没有按预期执行。它不会对contractZipcode或workZipcode列执行任何转换。如resultDf中所述,第一行应更改为83843。

提前感谢!

1 个答案:

答案 0 :(得分:1)

@Bean
public JobService jobService() throws Exception {
    SimpleJobServiceFactoryBean factory = new SimpleJobServiceFactoryBean();
    return factory.getObject();
}

请注意,当您尝试索引时会返回空切片:

In [2]: import pandas as pd


In [3]: testDf = pd.DataFrame({'unique_transaction_id'  : ['1', '1', '1'],
   ...:                            'workZipcode'            : [838431000, 991631000, 99163],
   ...:                            'contractZipcode'        : [838431000, 991631000, 99163],
   ...:                            'workCountryCode'        : ['USA: STUFF', 'NONE: STUFF', 'USA: STUFF'],
   ...:                            'contractCountryCode'    : ['USA: STUFF', 'NONE: STUFF', 'USA: STUFF']}
   ...:                      )
   ...:
   ...: resultDf = pd.DataFrame({'unique_transaction_id'    : ['1', '1', '1'],
   ...:                               'workZipcode'             : [83843, 991631000, 99163],
   ...:                               'contractZipcode'         : [83843, 991631000, 99163],
   ...:                               'workCountryCode'         : ['USA: STUFF', 'NONE: STUFF', 'USA: STUFF'],
   ...:                               'contractCountryCode'     : ['USA: STUFF', 'NONE: STUFF', 'USA: STUFF']})
   ...:
   ...:
   ...:

如果您在不同的过滤器周围添加括号:

In [4]: testDf.ix[testDf['workCountryCode'].str.contains('USA') &
                  testDf['workZipcode'].astype(str).str.len() > 5,
                  'workZipcode']
Out[4]: Series([], Name: workZipcode, dtype: int64)

你得到你想要的东西。如果您使用In [5]: testDf.ix[(testDf['workCountryCode'].str.contains('USA')) & (testDf['workZipcode'].astype(str).str.len() > 5), 'workZipcode'] Out[5]: 0 838431000 Name: workZipcode, dtype: int64 也无关紧要:

loc

所以这是清理过的功能: 为了便于阅读,我添加了一些小lambda。

In [6]: testDf.loc[testDf['workCountryCode'].str.contains('USA') &
                   testDf['workZipcode'].astype(str).str.len() > 5, 
                  'workZipcode']
Out[6]: Series([], Name: workZipcode, dtype: int64)