Question

我有一个需要引用大型数据集的python类。我需要创建数千个类的实例，所以我不想每次都加载数据集。将数据放在另一个必须首先创建并作为参数传递给另一个类的类中是直截了当的：

class Dataset():
    def __init__(self, filename):
        # load dataset...

class Class_using_dataset():
    def __init__(self, ds)
        # use the dataset and do other stuff

ds = Dataset('file.csv')
c1 = Class_using_dataset(ds)
c2 = Class_using_dataset(ds)
# etc...

但我不希望我的用户必须处理数据集，因为如果我可以在后台执行此操作，它始终是相同的。

当我创建类的第一个实例时，是否有一种pythonic /规范方法将数据加载到全局命名空间中？我希望有类似的东西：

class Class_using_dataset():
    def __init__(self):
        if dataset doesn't exist:
             load dataset into global namespace
        use dataset

Answer 1

如果数据集在类的所有实例之间共享，则将其设为类变量。

class Dataset():
    def __init__(self, filename):
        # load dataset...

class Class_using_dataset():
    def __init__(self)
        # use the dataset and do other stuff

Class_using_dataset.ds = Dataset('file.csv')
c1 = Class_using_dataset()
c2 = Class_using_dataset()
# etc...

Answer 2

您可以在解析Class_using_dataset类时将数据集加载到类变量中，也可以在用户创建类的第一个实例时加载数据集。

第一个策略只需要移动将数据集加载到类本身的行。

class Dataset():
    def __init__(self, filename):
        # load dataset...

class Class_using_dataset():
    ds = Dataset('file.csv')

    def __init__(self)
        # use the dataset and do other stuff

# `Class_using_dataset.ds` already has the loaded dataset
c1 = Class_using_dataset()
c2 = Class_using_dataset()

对于第二个，请将None分配给类变量，并且只有在__init__为ds的情况下才会在None方法中加载数据集。

class Dataset():
    def __init__(self, filename):
        # load dataset...

class Class_using_dataset():
    ds = None

    def __init__(self)
        if Class_using_dataset.ds is None:
            Class_using_dataset.ds = Dataset('file.csv')
        # use the dataset and do other stuff

# `Class_using_dataset.ds` is `None`
c1 = Class_using_dataset()
# Now the dataset is loaded
c2 = Class_using_dataset()

Python-在第一个类实例上加载数据

2 个答案: