抑制通过空列/数据框传递函数的错误

时间:2019-08-23 17:26:38

标签: python pandas

我希望能够调用此脚本来对原始数据集执行提取转换,即使某些/所有数据集为空。

如果它们为空,则最终输出应简单地读为NaN。

这是因为此报告需要在我不在的情况下运行,如果脚本完全停止,不完整的信息总比没有信息要好。

在此示例中,输入df完全为空,但是通常有些时候列类型不符合预期,或者某些列为空,而另一些则不为此类,等等。

输入:

Materials Tracking 
Empty Dataframe
Columns:['Testing School DBN', 'EXAM', 'Scoring Site DBN', '# BOXES',
       'Last Updated On']

所需的输出:

Materials Report 
Empty Dataframe
Columns: [Tag, Scoring Site, Scoring Site Including X, Valid Site/Exam?]

我的脚本的一部分:

MaterialsReport_df = pd.DataFrame()

MaterialsReport_columns = ['Tag',
                           'Scoring Site',
                           'Scoring Site Including X',
                           'Valid Site/Exam?']
#tag
MaterialsReport_df['Tag'] = MaterialsTracking_df['Testing School DBN'] + " - " + MaterialsTracking_df['EXAM']

# scoring site
MaterialsReport_df['Scoring Site'] = MaterialsTracking_df['Scoring Site DBN']

#scoring site with x's
MaterialsReport_df['Scoring Site Including X'] = np.where(MaterialsTracking_df['# BOXES'] == "X",
                                                          MaterialsTracking_df['Scoring Site DBN'].astype(str) + " [X]",
                                                          MaterialsTracking_df['Scoring Site DBN'])

# exams for site
MaterialsReport_df['EXAM'] = MaterialsTracking_df['EXAM']
MaterialsReport_df['Exams for this Site'] = MaterialsReport_df.join(Sites_To_Exams_df.set_index('DBN4')['ALL EXAMS'], on='Scoring Site')['ALL EXAMS']

# valid site
def valid_site(exam, exams_for_this_site):
    if exam in exams_for_this_site:
        return 'Y'
    else:
        return 'N'

MaterialsReport_df['Valid Site/Exam?'] = np.vectorize(valid_site)(MaterialsReport_df['EXAM'],
MaterialsReport_df['Exams for this Site'])

# reorder columns
MaterialsReport_df = MaterialsReport_df.reindex(columns=MaterialsReport_columns)

当我尝试使用带有空白数据帧的np.vectorize传递函数时,我收到ValueError: cannot call vectorize on size 0 inputs unless otypes is set

我目前认为最好的处理方法(我认为这根本不是最好的方法)是制作一个错误处理装饰器,并将函数包装在vectorize类中:

错误处理程序修饰符:

def get_decorator(errors=(Exception, ), default_value=None):

    def decorator(func):

        def new_func(*args, **kwargs):
            try:
                return func(*args, **kwargs)
            except errors as e:
                print("Got error! ", repr(e))
                return default_value

        return new_func

    return decorator

error_wrap = get_decorator((ValueError, TypeError))

经过编辑的NumPy源代码示例:

    @error_suppress.error_wrap
    def _vectorize_call(self, func, args):
        """Vectorized call to `func` over positional `args`."""
        if self.signature is not None:
            res = self._vectorize_call_with_signature(func, args)
        elif not args:
            res = func()
        else:
            ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)

            # Convert args to object arrays first
            inputs = [array(a, copy=False, subok=True, dtype=object)
                      for a in args]

            outputs = ufunc(*inputs)

            if ufunc.nout == 1:
                res = array(outputs, copy=False, subok=True, dtype=otypes[0])
            else:
                res = tuple([array(x, copy=False, subok=True, dtype=t)
                             for x, t in zip(outputs, otypes)])
        return res

0 个答案:

没有答案