我有一个带有一些UDF的脚本,它主要使用列表推导来更改数据帧:
def createclaimfields(field,master):
print 'creating unique matter ids for {} at {}'.format(field,getdt())
dfcol = ['MATTER ID',field]
df = master[dfcol].dropna().drop_duplicates()
print 'created unique matter ids for {} at {}'.format(field,getdt())
print 'started getting CLASS HYBRID claims for {} at {}'.format(field,getdt())
df['{} CLASS HYBRID CLM NO'.format(field)]=[getclasshybrid(clm) for clm in df[field]]
print 'finished getting CLASS HYBRID claims for {} at {}. Found {} matches'.format(field,getdt(),len(df['{} CLASS HYBRID CLM NO'.format(field)]))
print 'started getting HRV claims for {} at {}'.format(field,getdt())
df['{} HRV CLM NO'.format(field)]=[gethrv(clm) for clm in df[field]]
print 'finished getting HRV claims for {} at {}. Found {} matches'.format(field,getdt(),len(df['{} HRV CLM NO'.format(field)]))
print 'started getting CC claims for {} at {}'.format(field,getdt())
df['{} CC CLM NO'.format(field)]=[getcc(clm) for clm in df[field]]
print 'finished getting CC claims for {} at {}. Found {} matches'.format(field,getdt(),len(df['{} CC CLM NO'.format(field)]))
print 'started getting PASS claims for {} at {}'.format(field,getdt())
df['{} PASS CLM NO'.format(field)]=[getpass(clm) for clm in df[field]]
print 'finished getting PASS claims for {} at {}. Found {} matches'.format(field,getdt(),len(df['{} PASS CLM NO'.format(field)]))
print 'merging {} into claimfields at {}'.format(field,getdt())
master = master.merge(df,how='left',on=['MATTER ID',field])
print 'merged {} into claimfields at {}'.format(field,getdt())
return master
fieldlist = ['MATTER NUMBER','MATTER NAME','CLAIM NUMBER LISTING']
mattercol = ['MATTER NUMBER','MATTER NAME','CLAIM NUMBER LISTING','MATTER ID']
claimfields = rawtrans[mattercol].dropna().drop_duplicates().head()
[createclaimfields(field,claimfields) for field in fieldlist]
不幸的是,当我在运行之后调用claimfields
时,我得到了没有添加列的原始输出。我猜这是因为'claimfields'调用函数'rawtrans [mattercol] .dropna()。drop_duplicates()。head()'而不是该函数调用的实际输出。如何将claimfields定义为它自己的对象而不是源自'rawtrans'df的命令链?
谢谢!
编辑::问题解决了!我用以下内容替换了[createclaimfields(field,claimfields) for field in fieldlist]
:
for field in fieldlist:
claimfields=createclaimfields(field,claimfields)
tl; dr我没有正确分配输出数据帧,而且我也不需要使用list comp来遍历fieldlist中的每个字段。
编辑#2 - 样本UDF
def getcc(clm):
zlist=range(len(clm))
#create list of prefixes from letterlist and numberlist
prefixlist = ['AA','AB','AC','AD','AE','AF','GA','GB','GC','GD','GE','GF','ZZ']
# list of all 20 length substrings for list comprehension below
clmstrs=[x for x in [clm[z:z+8] for z in zlist] if (len(x)==8) & (any(p in x[:-2] for p in prefixlist)) & sum(c.isalpha() for c in x)==2]
if (len(clmstrs)> 0):
return clmstrs[0]
else:
return np.nan