我希望能够调用此脚本来对原始数据集执行提取转换,即使某些/所有数据集为空。
如果它们为空,则最终输出应简单地读为NaN。
这是因为此报告需要在我不在的情况下运行,如果脚本完全停止,不完整的信息总比没有信息要好。
在此示例中,输入df完全为空,但是通常有些时候列类型不符合预期,或者某些列为空,而另一些则不为此类,等等。
输入:
Materials Tracking
Empty Dataframe
Columns:['Testing School DBN', 'EXAM', 'Scoring Site DBN', '# BOXES',
'Last Updated On']
所需的输出:
Materials Report
Empty Dataframe
Columns: [Tag, Scoring Site, Scoring Site Including X, Valid Site/Exam?]
我的脚本的一部分:
MaterialsReport_df = pd.DataFrame()
MaterialsReport_columns = ['Tag',
'Scoring Site',
'Scoring Site Including X',
'Valid Site/Exam?']
#tag
MaterialsReport_df['Tag'] = MaterialsTracking_df['Testing School DBN'] + " - " + MaterialsTracking_df['EXAM']
# scoring site
MaterialsReport_df['Scoring Site'] = MaterialsTracking_df['Scoring Site DBN']
#scoring site with x's
MaterialsReport_df['Scoring Site Including X'] = np.where(MaterialsTracking_df['# BOXES'] == "X",
MaterialsTracking_df['Scoring Site DBN'].astype(str) + " [X]",
MaterialsTracking_df['Scoring Site DBN'])
# exams for site
MaterialsReport_df['EXAM'] = MaterialsTracking_df['EXAM']
MaterialsReport_df['Exams for this Site'] = MaterialsReport_df.join(Sites_To_Exams_df.set_index('DBN4')['ALL EXAMS'], on='Scoring Site')['ALL EXAMS']
# valid site
def valid_site(exam, exams_for_this_site):
if exam in exams_for_this_site:
return 'Y'
else:
return 'N'
MaterialsReport_df['Valid Site/Exam?'] = np.vectorize(valid_site)(MaterialsReport_df['EXAM'],
MaterialsReport_df['Exams for this Site'])
# reorder columns
MaterialsReport_df = MaterialsReport_df.reindex(columns=MaterialsReport_columns)
当我尝试使用带有空白数据帧的np.vectorize
传递函数时,我收到ValueError: cannot call vectorize on size 0 inputs unless otypes is set
。
我目前认为最好的处理方法(我认为这根本不是最好的方法)是制作一个错误处理装饰器,并将函数包装在vectorize
类中:
错误处理程序修饰符:
def get_decorator(errors=(Exception, ), default_value=None):
def decorator(func):
def new_func(*args, **kwargs):
try:
return func(*args, **kwargs)
except errors as e:
print("Got error! ", repr(e))
return default_value
return new_func
return decorator
error_wrap = get_decorator((ValueError, TypeError))
经过编辑的NumPy源代码示例:
@error_suppress.error_wrap
def _vectorize_call(self, func, args):
"""Vectorized call to `func` over positional `args`."""
if self.signature is not None:
res = self._vectorize_call_with_signature(func, args)
elif not args:
res = func()
else:
ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)
# Convert args to object arrays first
inputs = [array(a, copy=False, subok=True, dtype=object)
for a in args]
outputs = ufunc(*inputs)
if ufunc.nout == 1:
res = array(outputs, copy=False, subok=True, dtype=otypes[0])
else:
res = tuple([array(x, copy=False, subok=True, dtype=t)
for x, t in zip(outputs, otypes)])
return res