Question

我查看了 How do I generate a new set of values from existing dataset 和 generate data by using existing dataset as the base dataset 都没有满足 mig 的需求，所以我阅读了大量循环答案，但这并没有让我一路走好。

我有传统的成人数据集。清理干净并保存一些用于验证后，它看起来像这样：

Adult dataset - 43958 rows and 12 colums

我想运行一个循环，获取每一行并添加一个新行，其中年龄增加 1，但保持所有其他数据等于该行的数据。

我尝试了两种不同的方法。第 1 个：

    df1 = newDataFrame

#iterate through each row of dataframe
for index, row in df1.iterrows():
    new_row ={'age':index+1 , 'workclass':[], 'education':[], 'educational-num':[], 'marital-status':[],'occupation':[],
          'race':[], 'gender':[], 'capital-gain':[], 'capital-loss':[],'hours-per-week':[], 'income':[]}
print(new_row)

但这给了我：

 {'age': 35596, 'workclass': [], 'education': [], 'educational-num': [], 'marital-status': [], 'occupation': [], 'race': [], 'gender': [], 'capital-gain': [], 'capital-loss': [], 'hours-per-week': [], 'income': []}

我也试过：

df1 = newDataFrame
colums =list(df1)

#iterate through each row of dataframe
for index, row in df1.iterrows():
    values = [([0]+1),[1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11]]
    zipped =zip(colums, values)
    a_dictionary = dict(zipped)

    print(a_dictionary)

但是报错

> TypeError: can only concatenate list (not "int") to list

我知道这是因为 colums = list 但我不知道如何更改它。尝试了一些 append() 但没有帮助。

所以两天后我转向你。

目标是使数据集更大，但保持值之间的强相关性。

完美，谢谢@gofvonx！我不得不做一个简单的改变，但这有效

df1 = newDataFrame
df_new= df1.copy()
df_new.age += 1
pd.concat([df1, df_new], axis=0, ignore_index=True)

Answer 1

您上面的代码有一些问题。例如，new_row 在每次迭代时都会被覆盖，而不存储之前的值。

但是您不需要使用循环。你可以试试

df_new = df1.copy()
df_new['age'] += 1
pd.concat([df1, df_new], axis=0, ignore_index=True)

请注意，ignore_index=True 将创建一个新索引 0,...,n-1（请参阅文档 here）。

Python - 从现有数据集生成新的更大的数据集，循环行

1 个答案: