Python pandas将结果数据写入xlsm而不会丢失宏

时间:2017-01-03 08:19:45

标签: python excel vba pandas

我有很多excel文件需要编译成一个excel文件,然后将编译后的文件复制到某个工作表中的现有excel文件(带宏/ .xlsm)。

我解决了第一个问题(将多个excel文件编译成一个excel文件)。结果数据框以.csv格式保存。结果文件如下所示。 the resulted dataframe

直到这里没有问题。下一步我正在努力寻找如何做到这一点。

从结果数据框中,我想将数据帧“复制并粘贴”到相应标题中“Source”表中的宏(.xlsm)的现有excel文件中。现有的excel文件如下所示。 target excel file

正如您在上图中所看到的,我想跳过在A列中写入任何数据,因为此列中的单元格充满了公式。我想在现有excel文件中将列B中的结果数据帧写入列Q.但是,在写入数据之前,我想删除所有单元格中的所有现有数据(A列中的单元格除外)。

基本上我想做以下事情:

  1. 将列B中的单元格中的所有值删除到列Q中 现有的xlsm文件(在“Source”表中)
  2. 将结果数据框中的新值写入列B,直到列Q
  3. 使用相同名称保存excel文件,而不会丢失宏
  4. 非常感谢任何反馈!谢谢!

    此致

    阿诺德

3 个答案:

答案 0 :(得分:1)

我发现了基于openpyxl的以下解决方案。我了解到xlsxwriter无法打开现有的excel文件。因此,我的方法基于openpyxl。

import pandas as pd 
import openpyxl # one excel reader/writer compatible with pandas

book = openpyxl.load_workbook('input.xlsm', keep_vba = True) # Load existing .xlsm file

with pd.ExcelWriter('output.xlsm', engine='openpyxl') as writer: # open a writer instance with the filename of the 
    
    writer.book = book # Hand over input workbook
    writer.sheets = dict((ws.title, ws) for ws in book.worksheets) # Hand over worksheets
    writer.vba_archive = book.vba_archive # Hand over VBA information 


    df_write.to_excel(writer, sheet_name = 'Sheet1', columns = ['A'],
                  header = False, index = False,
                  startrow = 1, startcol = 0)
    # Writes a column A of the Dataframe into the excel sheet 'Sheet1', which can 
    # already be present in input.xlsm, to row 1, col 0

    
    writer.save()

答案 1 :(得分:0)

由于可以使用QueryTables使用Excel VBA宏处理csv导入电子表格,因此请考虑让Python将带有COM接口的VBA复制到Excel对象库。之前的所有宏代码都保持不变,因为没有任何内容被覆盖但是单元格数注意:以下假设您使用的是Excel for Windows。

使用win32com库,Python几乎可以复制VBA所做的任何事情。实际上,您将了解VBA是Office应用程序中的附加引用,而不是本机内置对象,并且执行相同的COM接口!请参阅IDE中Tools\References中的第一个选定项。

import pandas as pd
import win32com.client as win32

# ...same pandas code...    
macrofile = "C:\\Path\\To\\Macro\\Workbook.xlsm"
strfile = "C:\\Path\\To\\CSV\\Output.csv"
df.to_csv(strfile)

try:
    xl = win32.gencache.EnsureDispatch('Excel.Application')
    wb = xl.Workbooks.Open(macrofile)

    # DELETE PREVIOUS DATA
    wb.Sheets("Source").Range("B:Q").EntireColumn.Delete()

    # ADD QUERYTABLE (SPECIFYING DESTINATION CELL START)
    qt = wb.Sheets("Source").QueryTables.Add(Connection="TEXT;" + strfile, 
                                             Destination=wb.Sheets(1).Cells(2, 2))
    qt.TextFileParseType = 1
    qt.TextFileConsecutiveDelimiter = False
    qt.TextFileTabDelimiter = False
    qt.TextFileSemicolonDelimiter = False
    qt.TextFileCommaDelimiter = True
    qt.TextFileSpaceDelimiter = False
    qt.Refresh(BackgroundQuery=False)

    # REMOVE QUERYTABLE
    for qt in wb.Sheets("Source").QueryTables:
        qt.Delete()

    # CLOSES WORKBOOK AND SAVES CHANGES
    wb.Close(True)

except Exception as e:
    print(e)

finally:    
    qt = None
    wb = None
    xl = None

或者,在VBA中创建一个新宏(放在独立模块中)并让Python调用它,将csv文件路径作为参数传递:

<强> VBA

Public Sub ImportCSV(strfile As String)
    Dim qt As QueryTable

    ThisWorkbook.Sheets("Source").Range("B:Q").EntireColumn.Delete

    ' ADD QUERYTABLE
    With ThisWorkbook.Sheets("Source").QueryTables.Add(Connection:="TEXT;" & strfile, _
        Destination:=ThisWorkbook.Sheets(1).Cells(2, 2))
            .TextFileParseType = xlDelimited
            .TextFileConsecutiveDelimiter = False
            .TextFileTabDelimiter = False
            .TextFileSemicolonDelimiter = False
            .TextFileCommaDelimiter = True
            .TextFileSpaceDelimiter = False

            .Refresh BackgroundQuery:=False
    End With

    ' REMOVE QUERYTABLE
    For Each qt In ThisWorkbook.Sheets(1).QueryTables
        qt.Delete
    Next qt

    Set qt = Nothing
End Sub

<强>的Python

import pandas as pd
import win32com.client as win32

# ...same pandas code...    
macrofile = "C:\\Path\\To\\Macro\\Workbook.xlsm"
strfile = "C:\\Path\\To\\CSV\\Output.csv"
df.to_csv(strfile)

try:
    xl = win32.gencache.EnsureDispatch('Excel.Application')

    wb = xl.Workbooks.Open(macrofile)
    xl.Application.Run('ImportCSV', strfile)

    wb.Close(True)
    xl.Quit

except Exception as e:
    print(e)

finally:    
    wb = None
    xl = None

答案 2 :(得分:0)

抱歉有点迟到回来更新我的问题。最后我用openpyxl包解决了我的问题。

所以这是我的最终代码:

import openpyxl
import os
import string
import pandas as pd
import numpy as np

path = #folder directory
target_file = #excel filename
sheetname = #working sheet that you wish to work on with

filename = os.path.join(path, target_file)

wb = openpyxl.load_workbook(filename, keep_vba=True)
sheet = wb.get_sheet_by_name(sheetname)

# To Erase All Values within Selected Columns
d = dict()
for x, y in zip(range(1, 27), string.ascii_lowercase):
    d[x] = y.upper()

max_row = sheet.max_row
max_col = sheet.max_column

for row in range(max_row):
    row += 1
    if row == 1: continue
    for col in range(max_col):
        col += 1
        if col == 1: continue
        sheet['{}{}'.format(d[col], row)] = None

# To Write Values to the Blank Worksheet
path_dataframe = # folder directory to the csv file
target_compiled = # csv filename
filename_compiled = os.path.join(path_compiled, target_compiled)

compiled = pd.read_csv(filename_compiled, low_memory=False, encoding = "ISO-8859-1")

for row in range(len(compiled.index)):
    row += 1
    if row == 1: continue # I do not want to change the value in row 1 in excel file because they are headers
    for col in range(max_col): 
        col += 1
        if col == 1: continue # I do not want to change the values in column 1 in excel file since they contain formula
        value = compiled.iloc[row-2][col-2]
        if type(value) is str: value = value
        elif type(value) is np.float64: value = float(value)
        elif type(value) is np.int64: value = int(value)
        sheet['{}{}'.format(d[col], row)] = value

wb.save(filename)