我有一个文件夹名称列表,我想根据此列表浏览文件夹,并合并在这些文件夹中找到的excel文件。
示例: 说我有以下目录:“ C:/ Users / XXX / Documents / File Tracking” 这包括文件夹A,B,C,D,E,F 现在,我有了一个文件夹名称列表:lst = [A,B,D]
现在,我要遍历文件夹A,B,D,并将在这些文件夹中找到的excel文件合并为一个,而忽略此列表中未提及的文件夹。
如果我要合并所有子文件夹中的文件,这是一些有效的代码
all_data = pd.DataFrame()
for f in glob.glob("C:/Users/XXX/Documents/File Tracking/*"):
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)
答案 0 :(得分:1)
您可以用最直接的方式进行操作-只需在选定的基本目录中获取目录列表,对其进行过滤,然后在每个目录中查找电子表格。请参见下面的样板:
import glob
import os
path = "C:/Users/XXX/Documents/File Tracking/"
allowed = ["A", "B", "D"]
# list of first-level directories from allowed list
dirs = [name for name in os.listdir(path) if os.path.isdir(os.path.join(path, name)) and name in allowed]
for dirname in dirs:
# iterate over all files that match pattern, for example, xlsx
for file_name in glob.glob(os.path.join(path, dirname, "*.xlsx")):
# process each file
答案 1 :(得分:1)
如果我理解正确,这应该可以正常工作。查看代码中的注释以获取更多说明。
import pandas as pd
import os
# assumes you have a list of the file paths
def consolidate_excel_files(folder_paths: list) -> pd.DataFrame:
# used to collect all dfs from folders
df_collection = []
for folder in folder_paths:
# makes sure your path is right
if os.path.isdir(folder):
# list comprehension that gets all excel files into a data frame
# will ignore any stray file that is not .xlsx or .xls
all_files_as_df = [pd.read_excel(os.path.absnpath(file))
for file in os.listdir(folder)
if os.splitext(file)[1] in ['.xlsx' or '.xls']]
# we only want a 1d list when we use pd.append, so we extend instead
df_collection.extend(all_files_as_df)
# assuming the index is not important
return pd.append(df_collection, ignore_index=True)
如果您假设有几件事,可能有一种较为简单的方法,但这是可行的。