Question

我有一个数据框，其中包含excel文件的路径，工作表名称和ID，每个都在一列中：

df = pd.DataFrame([['00100', 'one.xlsx', 'sheet1'],
                   ['00100', 'two.xlsx', 'sheet2'],
                   ['05300', 'thr.xlsx', 'sheet3'],
                   ['95687', 'fou.xlsx', 'sheet4'],
                   ['05300', 'fiv.xlsx', 'sheet5']],
                  columns=['id', 'file', 'sheet'])

此数据框如下所示：

      id         file   sheet
0  00100  c:\one.xlsx  sheet1
1  00100  c:\two.xlsx  sheet2
2  05300  c:\thr.xlsx  sheet3
3  95687  c:\fou.xlsx  sheet4
4  05300  c:\fiv.xlsx  sheet5

我创建了一个与apply一起使用的函数，它将读取每个文件并返回一个数据帧。

 def getdata(row):
    file = row['file']
    sheet = row['sheet']
    id = row['id']
    tempdf = pd.ExcelFile(file)     # Used on purpose
    tempdf = tempdf.parse(sheet)    # Used on purpose
    tempdf['ID'] = id
    return tempdf

然后我在初始数据帧上使用apply，这样它将为每一行返回一个数据集。问题是，我不知道如何存储以这种方式创建的数据帧。

我尝试将数据帧存储在新列中，但该列存储了无：

df['data'] = df.apply(getdata, axis=1)

我试图创建一本字典但是我想到的方式显然毫无用处：

results = {df.apply(getdata, axis=1)}  # for this one, in the function I tried to return id, tempdf

最后，我结束了转换＆＃39; id＆＃39;列的索引，以下列方式迭代它：

for id in df.index:
    df[id] = getdata(df.loc[id], id)

但我想知道是否有办法存储结果数据帧而不使用迭代器。

感谢您的反馈。

应用结果是一个数据帧，我该如何存储它？

0 个答案: