我有一个excel
文件,其中包含多张纸(〜100张)和8列。我正在尝试将我的第一列(“日期”)和我的最后一列(“预测”)从每个工作表合并到新的excel文件中。因此,我的新excel文件应该为每个工作表的“日期”和“预测”列组成一个工作表,其中包含多个预测列。为此,我的思考过程是先读取文件,然后使用pandas concat()
连接“预测”列。但是当我这样做时,python生成了很多NaN's
。我很好奇,如果我们能实现更好的方法。
**Sheet 1:**
Date col1 Col2 ..... Prediction1
01/01 9 5 5
02/01 3 7 5
**Sheet2**
Date col1 Col2 ..... Prediction2
01/01 9 5 4
02/01 3 7 6
注意:我是python的新手,请提供您的代码说明。
代码:
#Reading file
df=pd.read_excel('myexcel.xlsx")
#Combining files
excel_combine=pd.concat(df[frame] for frame in df.keys())
预期输出:
Date Prediction1 Prediction2
01/01 5 4
02/01 5 6
答案 0 :(得分:0)
这应该为您提供一个数据框,其中所有预测列均已重命名。 并不能总是给您最好的结果。也许尝试合并。 还可以在此处查看有关此主题的熊猫文档:atempt screenshot
import xlrd
import pandas
# Open the workbook
bk = xlrd.open_workbook('input_file_name')
# set counter to zero
n = 0
# loop through the sheet names
for i in bk.sheet_names():
# read one sheet into a df at a time
temp_df = pd.read_excel(file_name, sheet_name = i)
# set a new column name according to which sheet the prediction came from
new_col_name = 'pred_' + i
# rename the prediction column
temp_df.rename(columns = {'predition' : new_col_name}, inplace = True)
n += 1 # add one to counter each time a new sheet is processed
if n == 1:
# if this is the first loop a dtaframe called df is created
df = temp_df.copy()
else:
# if it is not the first loop merge the temp_df with the df table
df = df.merge(temp_df,
on = 'date',
how = 'left') # assuming you do have equal time series for all predictions I set a left join, otherwise a outer join may be better - look this up if you don't know it
# check df if everything is there
print df.info()
print df.head()
print df.describe()
# write to excel
df.to_excel('your_file_name', index = False)