Question

我的文件夹中有以下数据集：

a）10个excel电子表格（名称不同）

b）每个电子表格都有7个标签。每个电子表格的7个标签中，有2个具有完全相同的名称，其余5个具有不同的工作表名称。

c）我需要连接来自10个不同电子表格的5个excel工作表。

d）总共需要10 * 5张纸。

如何执行此操作，以便可以连接所有50个电子表格，最后输出的是一个“主”电子表格，其中附加了全部50个电子表格（而在每个excel文件中未将两个具有相同名称的表格串联在一起）？ / p>

我正在使用以下代码使用jupyter笔记本连接工作表，但这无济于事：

// Desired Format

.parent_selector {

  .child_selector {
    color: red;
  }
}

感谢阅读。

Answer 1

IIUC，您需要阅读10个工作簿中的所有工作表，并将每个数据框附加到列表data_sheets中。一种方法是分配列表names_to_find并在迭代时附加每个工作表名称。

names_to_find =[]
data_sheets = []
for excelfile in excelfile_list:
   xlsx = pd.ExcelFile(excelfile)

   for sheet in xlsx.sheet_names:
      data_sheets.append(xlsx.parse(sheet))
      names_to_find.append(sheet)

在读取所有数据之后，您可以使用names_to_find和np.unique查找唯一的工作表名称及其频率。

#find unique elements and return counts
unique, counts = np.unique(names_to_find,return_counts=True)

#find unique sheet names with a frequency of one
unique_set = unique[counts==1]

然后您可以使用np.argwhere查找unique_set中names_to_find存在的索引

#find the indices where the unique sheet names exist 
idx_to_select = np.argwhere(np.isin(names_to_find, unique_set)).flatten()

最后，对列表有一点理解，您可以对data_sheets进行子集化以包含感兴趣的数据：

#use list comprehension to subset data_sheets 
data_sheets = [data_sheets[i] for i in idx_to_select]
data = pd.concat(data_sheets)

一起：

import pandas as pd
import numpy as np
names_to_find =[]
data_sheets = []
for excelfile in excelfile_list:    
   xlsx = pd.ExcelFile(excelfile)

   for sheet in xlsx.sheet_names:        
      data_sheets.append(xlsx.parse(sheet))
      names_to_find.append(sheet)

#find unique elements and return counts
unique, counts = np.unique(names_to_find,return_counts=True)

#find unique sheet names with frequency of 1
unique_set = unique[counts==1]

#find the indices where the unique sheet names exist 
idx_to_select = np.argwhere(np.isin(names_to_find, unique_set)).flatten()

#use list comprehension to subset data_sheets subset data_sheets
data_sheets = [data_sheets[i] for i in idx_to_select]

#concat the data
data = pd.concat(data_sheets)

使用Pandas附加Excel电子表格

1 个答案: