Question

我已编写此代码：

class_1500_strings = ['transistor', 'resistor', 'diode', 'processor', 'thermistor', '555-timer', 'microcontroller']

class_1500 = {'conductivity' : gaussian_sample(100, 10, 250),
              'price_per_unit' : gaussian_sample(10, 2, 250),
              'number_bought' : categorical_sample(0, 10, 250),
              'manufacturer' : string_sample(250, class_1500_strings),
              'acquisition_date' : date_random_sample("1/1/2008 1:30 PM", "1/1/2009 4:50 AM", col_length=250),
              'runtime' : gaussian_sample(1000, 200, 250)

def generate_table(class_dict, class_label, number_of_samples):
    X, y = [], []
    for table_idx in range(number_of_samples):
        df = pd.DataFrame(class_dict)
        label = class_label
        X.append(df)
        y.append(label)
    return X, y

X, y = generate_table(class_1500, 0, 5)

目的是建立样本人工数据框。我的问题是X是相同数据帧的列表，而不是在类字典中调用随机生成器。如何使函数产生不同数据集的列表（即每次运行循环时调用采样器）？

Answer 1

您需要为构造的每个数据框创建一个新的字典。使用您当前的逻辑，一旦定义了class_1500，它就失去了与随机生成器逻辑的所有联系，因为这些值都类似于数组。

一种方法是定义一个单独的函数，该函数每次运行时都会给出不同的数组：

def make_data():
     return {'conductivity' : gaussian_sample(100, 10, 250),
             ...
             'runtime' : gaussian_sample(1000, 200, 250)}

def generate_table(class_label, number_of_samples):
    X, y = [], []
    for table_idx in range(number_of_samples):
        df = pd.DataFrame(make_data())
        label = class_label
        X.append(df)
        y.append(label)
    return X, y

X, y = generate_table(0, 5)

Answer 2

您正在循环的每次迭代中使用相同的值构造一个DataFrame（class_dict）。如果希望每个迭代的DataFrame值都不同，则必须提供一个不同的值。尝试将for循环更新为for key in class_dict，并为DataFrame的参数提供key。

这样可以使字典的每个键都有一个DataFrame，其中DataFrame的值由字典键的值（示例函数）生成。

Python函数不会为内部循环中的每次迭代调用给定参数

2 个答案: