Question

我有500个excel文件，从每个文件我必须跳过开始4行并选择几列。我可以为每个具有特定列的文件创建新的excel文件，或者我可以在SQL Server中推送数据。

我需要创建一个可以读取所有文件并执行所需过程的函数，并在excel或SQL中输出。

Answer 1

使用os库来处理文件系统很方便函数clean_one来自您的代码，只有很小的变化。函数clean_all将clean_one应用于root目录中的所有文件（位于我的代码'os.getcwd` [当前工作目录]中）：

import os
import pandas as pd

def clean_one(path, n):
    df = pd.read_excel(path, skiprows = 4)
    col_list = ['Emp Code', 'Emp Name', 'Net Salary', 'Gross Earnings', 'Provident Fund',
                'Provident Fund_A', 'Profession Tax', 'ESIC Deduction', 'ESIC Deduction_A',
                'Gross Deductions', 'Net Salary','Salary Bank', 'Salary Account No',
                'IFSC Code', 'PAN', 'Location', 'PF_Membership_No', 'State For PT']
    df.to_excel('File_%d.xlsx' % n, columns = col_list)

def clean_all(root):
    for n, filepath in enumerate(os.listdir(root)):
        path = os.path.join(root, filepath)
        clean_one(path, n)

if __name__ == "__main__":
    root = os.getcwd() # Replace it with necessary directory
    clean_all(root)

将数据帧循环写入多个Excel文件

1 个答案: