我正在研究一种将所有Excel文件中的所有信息都整合到一个文件中的方法,但是有一些特定的需求。 我需要删除除第一个文件外的所有文件的前21行,同时我需要删除“ E”列中所有空的行。
import pandas as pd
import glob
#all files in directory (NOT SURE IF I CAN OPTIMIZE THE CODE WITH THIS)
#AM NOT USING THIS LINE AT THE MOMENT
#excel_names = glob.glob('*JAN_2019-jan.xlsx')
# filenames
excel_names = ["file1.xlsx", "file2.xlsx", "file3.xlsx"]
# read them in
excels = [pd.ExcelFile(name) for name in excel_names]
# turn them into dataframes
frames = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in
excels]
# delete the first row for all frames except the first (NOT WORKING)
# i.e. remove the header row -- assumes it's the first (NOT WORKING)
frames[21:] = [df[21:] for df in frames[21:]]
# concatenate them..
combined = pd.concat(frames)
# write it out
combined.to_excel("c.xlsx", header=False, index=False)
该文件已创建,串联在一起,但是它不会删除除第一个文件以外的所有其他文件的前21行。我需要您的帮助,以找出如何删除在“ E”列为空白的所有行。
非常感谢所有人。
答案 0 :(得分:1)
要删除前21行-您可以这样做
frames = [df.iloc[21:, :] for df in frames]
并删除列NaN
中的所有E
值-您可以这样做
combined.dropna(subset=["E"], inplace=True)
您的最终代码将如下所示-
import pandas as pd
import glob
#all files in directory (NOT SURE IF I CAN OPTIMIZE THE CODE WITH THIS)
#AM NOT USING THIS LINE AT THE MOMENT
#excel_names = glob.glob('*JAN_2019-jan.xlsx')
# filenames
excel_names = ["file1.xlsx", "file2.xlsx", "file3.xlsx"]
# read them in
excels = [pd.ExcelFile(name) for name in excel_names]
# turn them into dataframes
frames = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in
excels]
# delete the first row for all frames except the first (NOT WORKING)
# i.e. remove the header row -- assumes it's the first (NOT WORKING)
frames = [df.iloc[21:, :] for df in frames]
# concatenate them..
combined = pd.concat(frames)
combined.dropna(subset=["E"], inplace=True)
# write it out
combined.to_excel("c.xlsx", header=False, index=False)
要从除第一个数据帧以外的所有数据帧中删除前21行-您可以执行此操作-
frames_2 = [df.iloc[21:, :] for df in frames[1:]]
#And combine them separately
combined = pd.concat([frames[0], *frames_2])
要从数据框中排除字符“-”-
combined = combined[~combined['E'].isin(['-'])]
答案 1 :(得分:0)
删除第2到21行(索引从0开始): df.drop(df.index [[1,20]])
用于删除“ E”列中包含空值的所有行: df.dropna(subset = ['E'])