Question

我正在研究一种将所有Excel文件中的所有信息都整合到一个文件中的方法，但是有一些特定的需求。我需要删除除第一个文件外的所有文件的前21行，同时我需要删除“ E”列中所有空的行。

import pandas as pd
import glob

#all files in directory (NOT SURE IF I CAN OPTIMIZE THE CODE WITH THIS)
#AM NOT USING THIS LINE AT THE MOMENT
#excel_names = glob.glob('*JAN_2019-jan.xlsx')

# filenames
excel_names = ["file1.xlsx", "file2.xlsx", "file3.xlsx"]

# read them in
excels = [pd.ExcelFile(name) for name in excel_names]

# turn them into dataframes
frames = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in 
excels]

# delete the first row for all frames except the first (NOT WORKING)
# i.e. remove the header row -- assumes it's the first (NOT WORKING)
frames[21:] = [df[21:] for df in frames[21:]]

# concatenate them..
combined = pd.concat(frames)

# write it out
combined.to_excel("c.xlsx", header=False, index=False)

该文件已创建，串联在一起，但是它不会删除除第一个文件以外的所有其他文件的前21行。我需要您的帮助，以找出如何删除在“ E”列为空白的所有行。

非常感谢所有人。

Answer 1

要删除前21行-您可以这样做

frames = [df.iloc[21:, :] for df in frames]

并删除列NaN中的所有E值-您可以这样做

combined.dropna(subset=["E"], inplace=True)

您的最终代码将如下所示-

import pandas as pd
import glob

#all files in directory (NOT SURE IF I CAN OPTIMIZE THE CODE WITH THIS)
#AM NOT USING THIS LINE AT THE MOMENT
#excel_names = glob.glob('*JAN_2019-jan.xlsx')

# filenames
excel_names = ["file1.xlsx", "file2.xlsx", "file3.xlsx"]

# read them in
excels = [pd.ExcelFile(name) for name in excel_names]

# turn them into dataframes
frames = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in 
excels]

# delete the first row for all frames except the first (NOT WORKING)
# i.e. remove the header row -- assumes it's the first (NOT WORKING)
frames = [df.iloc[21:, :] for df in frames]

# concatenate them..
combined = pd.concat(frames)
combined.dropna(subset=["E"], inplace=True)

# write it out
combined.to_excel("c.xlsx", header=False, index=False)

要从除第一个数据帧以外的所有数据帧中删除前21行-您可以执行此操作-

frames_2 = [df.iloc[21:, :] for df in frames[1:]]
#And combine them separately
combined = pd.concat([frames[0], *frames_2])

要从数据框中排除字符“-”-

combined = combined[~combined['E'].isin(['-'])]

Answer 2

删除第2到21行（索引从0开始）： df.drop（df.index [[1,20]]）

用于删除“ E”列中包含空值的所有行： df.dropna（subset = ['E']）

如何跳过特定列中的第一行和之后的所有空行？

2 个答案: