使用Loc更新熊猫数据框中的行无法正常工作

时间:2018-09-04 05:31:53

标签: python python-3.x pandas dataframe

我有一个名为output的数据框-

RAW_ENTITY_NAME   ENTITY_TYPE       ENTITY_NAME        IS_MAIN
01-03-2017        TNRMATDT          01 03 2017         1
04-02-2017        TNRSTRTDT         04 02 2017         1
documents         TNRTYPE           SIGHT              1
documents         TNRDOCSBY         NOT FOUND          1
accept            TNRDTL            accept             1 
23                TNRDAYS           23                 1

print(df.dtypes())

RAW_ENTITY_NAME               object
ENTITY_TYPE                   object
ENTITY_NAME                   object
IS_MAIN                       object

注意-ENTITY_TYPE = TNRTYPEENTITY_NAME = SIGHTIS_MAIN = 1在数据框中只会出现一次。

如果ENTITY_TYPE为TNRTYPE,ENTITY_NAME = SIGHT AND IS_MAIN = 1,我想更新一些值。

temp = output.loc[(output['IS_MAIN'] == 1) & (output['ENTITY_TYPE'] == 'TNRTYPE'), 'ENTITY_NAME']
temp = temp.reset_index(drop=True)
temp = temp[0]
if (temp == 'SIGHT'):
   output.loc[(output['IS_MAIN'] == '1') & (output['ENTITY_TYPE'] == 'TNRDOCSBY'), 'ENTITY_NAME'] = 'PAYMENT'

   output.loc[(output['IS_MAIN'] == '1') & (output['ENTITY_TYPE'].isin(['TNRDTL'])),
                                   ['ENTITY_NAME', 'RAW_ENTITY_NAME']] = 'NOT APPLICABLE'

   output.loc[(output['IS_MAIN'] == '1') & (output['ENTITY_TYPE'].isin(['TNRDAYS'])),
                                   ['ENTITY_NAME']] = '0'

   output.loc[(output['IS_MAIN'] == '1') & (output['ENTITY_TYPE'].isin(['TNRDAYS'])),
                                   ['RAW_ENTITY_NAME']] = ''

   output.loc[(output['IS_MAIN'] == '1') & (output['ENTITY_TYPE']=='TNRSTRTDT'),
                                   ['ENTITY_NAME', 'RAW_ENTITY_NAME']] = ''

   output.loc[(output['IS_MAIN'] == '1') & (output['ENTITY_TYPE']=='TNRMATDT'),
                                   ['ENTITY_NAME', 'RAW_ENTITY_NAME']] = ''

最终输出是-

RAW_ENTITY_NAME   ENTITY_TYPE       ENTITY_NAME        IS_MAIN
    01-03-2017        TNRMATDT          01 03 2017         1
    04-02-2017        TNRSTRTDT         04 02 2017         1
    documents         TNRTYPE           SIGHT              1
    documents         TNRDOCSBY         PAYMENT            1
    NOT APPLICABLE    TNRDTL            NOT APPLICABLE     1 
                      TNRDAYS           0                  1

您可以看到,除了前两行外,所有内容都在更新,即ENTITY_TYPE = TNRMATDT和TNRSTRTDAT。

我想知道为什么下面的代码没有给出期望的结果。

output.loc[(output['IS_MAIN'] == '1') & (output['ENTITY_TYPE']=='TNRSTRTDT'),
                                   ['ENTITY_NAME', 'RAW_ENTITY_NAME']] = ''

output.loc[(output['IS_MAIN'] == '1') & (output['ENTITY_TYPE']=='TNRMATDT'),
                                       ['ENTITY_NAME', 'RAW_ENTITY_NAME']] = ''

如果有人可以发现我犯的错误或告诉我任何解决方法,我会很高兴。

非常感谢。

2 个答案:

答案 0 :(得分:1)

对我来说,您的解决方案效果很好,我尝试将其重写以提高可读性,而不是重复相同的条件:

temp = output.loc[(output['IS_MAIN'] == '1') & 
                  (output['ENTITY_TYPE'] == 'TNRTYPE'), 'ENTITY_NAME']

#if values in IS_MAIN are integers
#temp = output.loc[(output['IS_MAIN'] == 1) & 
#                  (output['ENTITY_TYPE'] == 'TNRTYPE'), 'ENTITY_NAME']

if (temp.iat[0] == 'SIGHT'):
#more general working if not match condition
#if (next(iter(temp), 'not match') == 'SIGHT'):

    m1 = output['IS_MAIN'] == '1'
    #if values in IS_MAIN are integers
    #m1 = output['IS_MAIN'] == 1
    m2 = output['ENTITY_TYPE'] == 'TNRDOCSBY'
    m3 = output['ENTITY_TYPE'] == 'TNRDTL'
    m4 = output['ENTITY_TYPE'] == 'TNRDAYS'
    m5 = output['ENTITY_TYPE'].isin(['TNRMATDT','TNRSTRTDT'])

    output.loc[m1 & m2, 'ENTITY_NAME'] = 'PAYMENT'

    output.loc[m1 & m3, ['ENTITY_NAME', 'RAW_ENTITY_NAME']] = 'NOT APPLICABLE'

    output.loc[m1 & m4, ['ENTITY_NAME']] = '0'
    output.loc[m1 & m4, ['RAW_ENTITY_NAME']] = ''

    output.loc[m1 & m5, ['ENTITY_NAME', 'RAW_ENTITY_NAME']] = ''

print (output)
  RAW_ENTITY_NAME ENTITY_TYPE     ENTITY_NAME IS_MAIN
0                    TNRMATDT                       1
1                   TNRSTRTDT                       1
2       documents     TNRTYPE           SIGHT       1
3       documents   TNRDOCSBY         PAYMENT       1
4  NOT APPLICABLE      TNRDTL  NOT APPLICABLE       1
5                     TNRDAYS               0       1

答案 1 :(得分:1)

我有同样的问题。您要做的就是将IS_MAIN列设为数字

df['IS_MAIN'] = df['IS_MAIN'].astype(int)

这应该使它工作。