我有一个名为output的数据框-
RAW_ENTITY_NAME ENTITY_TYPE ENTITY_NAME IS_MAIN
01-03-2017 TNRMATDT 01 03 2017 1
04-02-2017 TNRSTRTDT 04 02 2017 1
documents TNRTYPE SIGHT 1
documents TNRDOCSBY NOT FOUND 1
accept TNRDTL accept 1
23 TNRDAYS 23 1
print(df.dtypes())
RAW_ENTITY_NAME object
ENTITY_TYPE object
ENTITY_NAME object
IS_MAIN object
注意-ENTITY_TYPE = TNRTYPE
,ENTITY_NAME = SIGHT
和IS_MAIN = 1
在数据框中只会出现一次。
如果ENTITY_TYPE为TNRTYPE,ENTITY_NAME = SIGHT AND IS_MAIN = 1,我想更新一些值。
temp = output.loc[(output['IS_MAIN'] == 1) & (output['ENTITY_TYPE'] == 'TNRTYPE'), 'ENTITY_NAME']
temp = temp.reset_index(drop=True)
temp = temp[0]
if (temp == 'SIGHT'):
output.loc[(output['IS_MAIN'] == '1') & (output['ENTITY_TYPE'] == 'TNRDOCSBY'), 'ENTITY_NAME'] = 'PAYMENT'
output.loc[(output['IS_MAIN'] == '1') & (output['ENTITY_TYPE'].isin(['TNRDTL'])),
['ENTITY_NAME', 'RAW_ENTITY_NAME']] = 'NOT APPLICABLE'
output.loc[(output['IS_MAIN'] == '1') & (output['ENTITY_TYPE'].isin(['TNRDAYS'])),
['ENTITY_NAME']] = '0'
output.loc[(output['IS_MAIN'] == '1') & (output['ENTITY_TYPE'].isin(['TNRDAYS'])),
['RAW_ENTITY_NAME']] = ''
output.loc[(output['IS_MAIN'] == '1') & (output['ENTITY_TYPE']=='TNRSTRTDT'),
['ENTITY_NAME', 'RAW_ENTITY_NAME']] = ''
output.loc[(output['IS_MAIN'] == '1') & (output['ENTITY_TYPE']=='TNRMATDT'),
['ENTITY_NAME', 'RAW_ENTITY_NAME']] = ''
最终输出是-
RAW_ENTITY_NAME ENTITY_TYPE ENTITY_NAME IS_MAIN
01-03-2017 TNRMATDT 01 03 2017 1
04-02-2017 TNRSTRTDT 04 02 2017 1
documents TNRTYPE SIGHT 1
documents TNRDOCSBY PAYMENT 1
NOT APPLICABLE TNRDTL NOT APPLICABLE 1
TNRDAYS 0 1
您可以看到,除了前两行外,所有内容都在更新,即ENTITY_TYPE = TNRMATDT和TNRSTRTDAT。
我想知道为什么下面的代码没有给出期望的结果。
output.loc[(output['IS_MAIN'] == '1') & (output['ENTITY_TYPE']=='TNRSTRTDT'),
['ENTITY_NAME', 'RAW_ENTITY_NAME']] = ''
output.loc[(output['IS_MAIN'] == '1') & (output['ENTITY_TYPE']=='TNRMATDT'),
['ENTITY_NAME', 'RAW_ENTITY_NAME']] = ''
如果有人可以发现我犯的错误或告诉我任何解决方法,我会很高兴。
非常感谢。
答案 0 :(得分:1)
对我来说,您的解决方案效果很好,我尝试将其重写以提高可读性,而不是重复相同的条件:
temp = output.loc[(output['IS_MAIN'] == '1') &
(output['ENTITY_TYPE'] == 'TNRTYPE'), 'ENTITY_NAME']
#if values in IS_MAIN are integers
#temp = output.loc[(output['IS_MAIN'] == 1) &
# (output['ENTITY_TYPE'] == 'TNRTYPE'), 'ENTITY_NAME']
if (temp.iat[0] == 'SIGHT'):
#more general working if not match condition
#if (next(iter(temp), 'not match') == 'SIGHT'):
m1 = output['IS_MAIN'] == '1'
#if values in IS_MAIN are integers
#m1 = output['IS_MAIN'] == 1
m2 = output['ENTITY_TYPE'] == 'TNRDOCSBY'
m3 = output['ENTITY_TYPE'] == 'TNRDTL'
m4 = output['ENTITY_TYPE'] == 'TNRDAYS'
m5 = output['ENTITY_TYPE'].isin(['TNRMATDT','TNRSTRTDT'])
output.loc[m1 & m2, 'ENTITY_NAME'] = 'PAYMENT'
output.loc[m1 & m3, ['ENTITY_NAME', 'RAW_ENTITY_NAME']] = 'NOT APPLICABLE'
output.loc[m1 & m4, ['ENTITY_NAME']] = '0'
output.loc[m1 & m4, ['RAW_ENTITY_NAME']] = ''
output.loc[m1 & m5, ['ENTITY_NAME', 'RAW_ENTITY_NAME']] = ''
print (output)
RAW_ENTITY_NAME ENTITY_TYPE ENTITY_NAME IS_MAIN
0 TNRMATDT 1
1 TNRSTRTDT 1
2 documents TNRTYPE SIGHT 1
3 documents TNRDOCSBY PAYMENT 1
4 NOT APPLICABLE TNRDTL NOT APPLICABLE 1
5 TNRDAYS 0 1
答案 1 :(得分:1)
我有同样的问题。您要做的就是将IS_MAIN列设为数字
df['IS_MAIN'] = df['IS_MAIN'].astype(int)
这应该使它工作。