我有一个包含多张表的excel文件,需要合并。但是列标题彼此不同。目前数据看起来像这样。
Sheet 1
+-------------+--------------+----------+--------+---------+---------+
| FISCAL_YEAR | COMPANY_CODE | ACCOUNTS | Header | Header1 | Header2 |
+-------------+--------------+----------+--------+---------+---------+
| 17 | Data | Data | 0 | 0 | 0 |
| 17 | Data | Data | 0 | 0 | 0 |
+-------------+--------------+----------+--------+---------+---------+
Sheet 2
+-------------+--------------+----------+---------+---------+
| FISCAL_YEAR | COMPANY_CODE | ACCOUNTS | Header3 | Header2 |
+-------------+--------------+----------+---------+---------+
| 15 | Data | Data | 0 | 0 |
| 15 | Data | Data | 0 | 0 |
+-------------+--------------+----------+---------+---------+
Sheet 3
+-------------+--------------+----------+---------+---------+---------+
| FISCAL_YEAR | COMPANY_CODE | ACCOUNTS | Header4 | Header1 | Header3 |
+-------------+--------------+----------+---------+---------+---------+
| 16 | Data | Data | 0 | 0 | 0 |
| 16 | Data | Data | 0 | 0 | 0 |
+-------------+--------------+----------+---------+---------+---------+
OUTPUT
+-------------+--------------+----------+--------+---------+---------+---------+---------+-----------+
| FISCAL_YEAR | COMPANY_CODE | ACCOUNTS | Header | Header1 | Header2 | Header3 | Header4 | SheetName |
+-------------+--------------+----------+--------+---------+---------+---------+---------+-----------+
| 17 | Data | Data | 0 | 0 | 0 | null | null | Sheet1 |
| 17 | Data | Data | 0 | 0 | 0 | null | null | Sheet1 |
| 15 | Data | Data | null | null | 0 | 0 | null | Sheet2 |
| 15 | Data | Data | null | null | 0 | 0 | null | Sheet2 |
| 16 | Data | Data | null | 0 | null | 0 | 0 | Sheet3 |
| 16 | Data | Data | null | 0 | null | 0 | 0 | Sheet3 |
+-------------+--------------+----------+--------+---------+---------+---------+---------+-----------+
我对Python比较陌生。我用过Pandas和numpy。 我有多达60张工作。任何人都可以帮助我理解我如何实现这一目标?如果不是python,那么我应该使用其他工具/方法?我真的可以使用代码示例开始。
非常感谢您的帮助。提前谢谢
答案 0 :(得分:1)
使用R,这很容易做到。
library(openxlsx) # to read xlsx files
library(purrr) # for the "map" function
wb <- loadWorkbook("path/filename.xlsx")
all_sheets <- names(wb)
merged_data <- map_df(all_sheets, ~ read.xlsx(wb, sheet = .x)
答案 1 :(得分:0)
在R中使用for循环和rbind
:
for (i in file.list) {
data <- rbind(data, read.xlsx(i, sheetIndex = 1))
}
rbind
用法:要垂直连接两个数据框(数据集),请使用rbind函数。两个数据框必须具有相同的变量,但它们不必具有相同的顺序。
total <- rbind(data frameA, data frameB)
答案 2 :(得分:0)
import pandas as pd
filepath = r"filePath here"
sheets_dict = pd.read_excel(filepath, sheet_name=None)
full_table = pd.DataFrame()
#loop through sheets
for name, sheet in sheets_dict.items():
sheet['sheet'] = name
#sheet = sheet.rename(columns=lambda x: x.split('\n')[-1])
full_table = full_table.append (sheet)
full_table.reset_index (inplace=True, drop=True)
#Write to Excel
writer = pd.ExcelWriter('consolidated_TB1.xlsx', engine='xlsxwriter')
full_table.to_excel(writer,'Sheet1')