Question

在机器学习中，您通常拥有大量数据集，并且必须根据稍后使用的算法对它们进行不同的处理。您将如何编写一个函数来记住一个特定的预处理管道，然后直接加载结果而不是重新计算它们？

这里有一个小代码示例，可以帮助您理解我的意思

import numpy as np
import pickle

def f(data, scaling=None, reduction=None):
    # here the function should check if it already has been called with the inputted keywords.
    # If so it just has to load the results from that call from the hard drive and exit the function call

    # data processing section
    if scaling == 'standard':
        # do scaling stuff
        pass
    if scaling == 'min_max':
        # do other scaling stuff
        pass

    if reduction == 'PCA':
        # do reduction stuff
        pass
    if reduction == 'ICA':
        # do other reduction stuff
        pass

    # saving results on hard drive
    with open('anypath', 'wb') as file:
        pickle.dump(data, file)
    return data

data = np.random.randint(100, size=(100,5))
config = {'scaling':'standard',
          'reduction':'ICA'}

data_processed = f(data, **config)

检查是否已经使用输入的关键字配置调用了函数

0 个答案: