在多个excel spreedsheets中删除列

时间:2020-03-02 09:09:49

标签: python pandas dataframe

在python中,我可以在多个Excel文件中删除列吗?即我有一个包含多个xlsx文件的文件夹。每个文件约有5列(日期,值,纬度,经度,地区)。我想删除每个Excel文件中除日期和值以外的所有列。

2 个答案:

答案 0 :(得分:4)

假设您有一个包含多个Excel文件的文件夹:

from pathlib import Path

folder = Path('excel_files')

xlsx_only_files = list(folder.rglob('*.xlsx'))


def process_files(xls_file):

    #stem is a method in pathlib 
    #that gets just the filename without the parent or the suffix
    filename = xls_file.stem

    #sheet = None ensure the data is read in as a dictionary
    #this sets the sheetname as the key
    #usecols allows you to read in only the relevant columns
    df = pd.read_excel(xls_file, usecols = ['date','value'] ,sheet_name = None)

    df_cleaned = [data.assign(sheetname=sheetname,
                              filename = filename)
                  for sheetname, data in df.items()
                 ]

    return df_cleaned


combo = [process_files(xlsx) for xlsx in xlsx_only_files]

final = pd.concat(combo, ignore_index = True)

让我知道怎么回事

stem

答案 1 :(得分:-1)

我建议您定义要保留为列表的列,然后选择作为新的数据框。

# after open excel file as 

df = pd.read_excel(...)

keep_cols = ['date', 'value']
df = df[keep_cols] # keep only selected columns it will return df as dataframe

df.to_excel(...)