我已被分配去调试prod中长时间运行的python脚本。这里的问题是脚本中没有打印或调试命令。想要检查是否有内部日志记录可让python维护其跟踪当前正在运行的命令的位置。
例如 我当前的脚本如下。这里的位置有巨大的文件,并且每个文件都在运行该过程。该脚本运行了最近的14个小时,我无法找到当前正在运行的命令。因此,维护当前正在运行的命令的任何内部python日志都将对您有所帮助。我只需要日志文件目录的帮助,或者如何找到这样的日志文件目录。
...
# Read data
for fname in glob('<location>*'):
df = pd.read_csv(fname,header=None,sep=',')
#here needs to modify the trigger feature every time
df.columns = [colnames]
df = df[df.cps_count>0].replace(r'\s+',np.nan,regex=True).replace('\\N',np.nan)
# df = df[df.cps_count>0].replace(r'\s+',np.nan,regex=True).replace('\\N',np.nan)
df = pd.get_dummies(df,columns=['prim_ppt']).fillna(np.nan)
cols_obj = df.columns[df.dtypes.eq('object')]
df[cols_obj] = df[cols_obj].apply(pd.to_numeric, errors='coerce')
#this = pd.concat([ids.reset_index()for i in np.setdiff1d(cols,df.columns):df[i] = 0,pd.DataFrame(scores)],axis=1)
xref_ids = df['cust_xref_id']
for i in np.setdiff1d(cols,df.columns):df[i] = 0
#xre_id needs to be replaced
#feature_importance also needs to be replaced
feature_importance = model.predict(xgb.DMatrix(df[[i for i in cols]],df['res'],missing=np.nan),pred_contribs=True)
combined=np.c_[feature_importance,xref_ids]
df_result=pd.DataFrame(combined,columns=cols+['bias_term','cust_xref_id'])
dfs.append(df_result)
final= pd.concat(dfs,axis=0)
#need to adjust for every model
df_result2=final[[colnames]]
#need to adjust for every model
df_rank=df_result2[[<somecolnames>]].rank(axis=1,method='first',numeric_only=None,na_option='keep',ascending=False,pct=False)
df_rank['cust_xref_id']=df_result2['cust_xref_id']
#drop the null cust_xref_id
df_rank=df_rank[df_rank['cust_xref_id'].notnull()]
df_rank['cust_xref_id']=df_rank['cust_xref_id'].astype('int')
#Data transformation
#transform from wide to long type
df_final=pd.melt(df_rank,id_vars=['cust_xref_id'],value_vars=[<somecolnames>])
df_final = df_final.sort_values(by=["cust_xref_id", "variable"])
#Note: output file has to be tab seperated -- according to Sahil from CSP core team
if not os.path.exists('<outputPath>'):
os.makedirs('<outputPath')
df_final.to_csv('<outputfile>',index=None,header=None,sep='\t')