我可以使用以下命令打开受密码保护的Excel文件:
import sys
import win32com.client
xlApp = win32com.client.Dispatch("Excel.Application")
print "Excel library version:", xlApp.Version
filename, password = sys.argv[1:3]
xlwb = xlApp.Workbooks.Open(filename, Password=password)
# xlwb = xlApp.Workbooks.Open(filename)
xlws = xlwb.Sheets(1) # counts from 1, not from 0
print xlws.Name
print xlws.Cells(1, 1) # that's A1
我不确定如何将信息传输到pandas数据帧。我是否需要逐个读取单元格,或者是否有方便的方法来实现?
答案 0 :(得分:4)
假设起始单元格为(StartRow,StartCol),结束单元格为(EndRow,EndCol),我发现以下内容对我有用:
# Get the content in the rectangular selection region
# content is a tuple of tuples
content = xlws.Range(xlws.Cells(StartRow, StartCol), xlws.Cells(EndRow, EndCol)).Value
# Transfer content to pandas dataframe
dataframe = pandas.DataFrame(list(content))
注意:Excel Cell B5在win32com中作为第5行,第2列给出。此外,我们需要list(...)从元组元组转换为元组列表,因为元组元组没有pandas.DataFrame构造函数。
答案 1 :(得分:1)
假设您可以使用win32com API将加密文件保存回磁盘(我意识到可能会失败),您可以立即调用顶级pandas函数read_excel
。您需要首先安装xlrd
(适用于Excel 2003),xlwt
(也适用于2003)和openpyxl
(适用于Excel 2007)的某些组合。 Here是用于读取Excel文件的文档。目前,pandas不支持使用win32com API读取Excel文件。如果您愿意,欢迎open up a GitHub issue。
答案 2 :(得分:0)
来自David Hamann的网站(所有积分都归他所有) https://davidhamann.de/2018/02/21/read-password-protected-excel-files-into-pandas-dataframe/
使用xlwings,打开文件将首先启动Excel应用程序,以便输入密码。
import pandas as pd
import xlwings as xw
PATH = '/Users/me/Desktop/xlwings_sample.xlsx'
wb = xw.Book(PATH)
sheet = wb.sheets['sample']
df = sheet['A1:C4'].options(pd.DataFrame, index=False, header=True).value
df
答案 3 :(得分:0)
根据@ikeoddy提供的建议,应该将各个部分放在一起:
How to open a password protected excel file using python?
# Import modules
import pandas as pd
import win32com.client
import os
import getpass
# Name file variables
file_path = r'your_file_path'
file_name = r'your_file_name.extension'
full_name = os.path.join(file_path, file_name)
# print(full_name)
Getting command-line password input in Python
# You are prompted to provide the password to open the file
xl_app = win32com.client.Dispatch('Excel.Application')
pwd = getpass.getpass('Enter file password: ')
xl_wb = xl_app.Workbooks.Open(full_name, False, True, None, pwd)
xl_app.Visible = False
xl_sh = xl_wb.Worksheets('your_sheet_name')
# Get last_row
row_num = 0
cell_val = ''
while cell_val != None:
row_num += 1
cell_val = xl_sh.Cells(row_num, 1).Value
# print(row_num, '|', cell_val, type(cell_val))
last_row = row_num - 1
# print(last_row)
# Get last_column
col_num = 0
cell_val = ''
while cell_val != None:
col_num += 1
cell_val = xl_sh.Cells(1, col_num).Value
# print(col_num, '|', cell_val, type(cell_val))
last_col = col_num - 1
# print(last_col)
ikeoddy的答案:
content = xl_sh.Range(xl_sh.Cells(1, 1), xl_sh.Cells(last_row, last_col)).Value
# list(content)
df = pd.DataFrame(list(content[1:]), columns=content[0])
df.head()
python win32 COM closing excel workbook
xl_wb.Close(False)
答案 4 :(得分:0)
添加到@Maurice 答案以获取工作表中的所有单元格而无需指定范围
wb = xw.Book(PATH, password='somestring')
sheet = wb.sheets[0] #get first sheet
#sheet.used_range.address returns string of used range
df = sheet[sheet.used_range.address].options(pd.DataFrame, index=False, header=True).value