在python中,我可以在多个Excel文件中删除列吗?即我有一个包含多个xlsx文件的文件夹。每个文件约有5列(日期,值,纬度,经度,地区)。我想删除每个Excel文件中除日期和值以外的所有列。
答案 0 :(得分:4)
假设您有一个包含多个Excel文件的文件夹:
from pathlib import Path
folder = Path('excel_files')
xlsx_only_files = list(folder.rglob('*.xlsx'))
def process_files(xls_file):
#stem is a method in pathlib
#that gets just the filename without the parent or the suffix
filename = xls_file.stem
#sheet = None ensure the data is read in as a dictionary
#this sets the sheetname as the key
#usecols allows you to read in only the relevant columns
df = pd.read_excel(xls_file, usecols = ['date','value'] ,sheet_name = None)
df_cleaned = [data.assign(sheetname=sheetname,
filename = filename)
for sheetname, data in df.items()
]
return df_cleaned
combo = [process_files(xlsx) for xlsx in xlsx_only_files]
final = pd.concat(combo, ignore_index = True)
让我知道怎么回事
答案 1 :(得分:-1)
我建议您定义要保留为列表的列,然后选择作为新的数据框。
# after open excel file as
df = pd.read_excel(...)
keep_cols = ['date', 'value']
df = df[keep_cols] # keep only selected columns it will return df as dataframe
df.to_excel(...)