我有一个Excel文件,其中包含带有标题“原始翻译”的列。根据我使用的语言和一些操作,我还有一个带有“原始翻译-{language}”列的DataFrame。
我的目标是打开Excel文件,将标题为“原始翻译”的列与我DataFrame列“原始翻译-{language}”中的所有数据一起覆盖,保留原始Excel文件格式,然后保存到新的输出文件夹。
这是我当前拥有的代码:
def output_formatted_capstan_file(df, original_file, country, language):
# this is where you generate the file:
# https://stackoverflow.com/questions/20219254/how-to-write-to-an-existing-excel-file-without-overwriting-data-using-pandas
try:
print(original_file)
book = load_workbook(original_file)
writer = pd.ExcelWriter(original_file, engine='openpyxl')
writer.book = book
df.to_excel(writer, ['Original Translation - {}'.format(language)])
writer.save()
except:
print('Failed')
答案 0 :(得分:1)
我将使用以下方法进行处理。
1)使用诸如pandas.read_excel
之类的函数导入excel文件,从而将excel中的数据导入数据帧。我称这个为exceldf
2)将此数据框与您在Pandas数据框中已有的数据合并。我将调用您现有的已翻译数据框translateddf
3)重新排序新合并的数据帧newdf
,然后导出数据。以下显示了有关如何重新排序的更多选项:re-ordering data frame
4)将数据导出到Excel。我将把它集成到您的初始代码中。对于该问题的一般性答案,其他人可能希望在to_excel
处查看集成的Pandas选项。示例代码
import pandas
# Read in the Excel file
exceldf = pandas.read_excel(open('your_xls_xlsx_filename'), sheetname='Sheet 1')
# Create a new dataframe with your merged data, merging on 'key1'.
# We then drop the column of the original translation, as it should no longer be needed
# I've included the rename argument in case you need it.
newdf = exceldf.merge(translateddf, left_on=['key1'], \
right_on=['key1']) \
.rename(columns={'Original Translation {language}': 'Original Translation {language}'}) \
.drop(['Original Translation'], axis=1)
# Re-order your data.
# Note that if you renamed anything above, you have to update it here too
newdf = newdf[['0', '1', '2', 'Original Translation {language}']]
# An example export, that uses the generic implementation, not your specific code
pandas.newdf.to_excel("output.xlsx")