从受密码保护的Excel文件到pandas DataFrame

时间:2013-03-08 01:15:46

标签: python excel pandas

我可以使用以下命令打开受密码保护的Excel文件:

import sys
import win32com.client
xlApp = win32com.client.Dispatch("Excel.Application")
print "Excel library version:", xlApp.Version
filename, password = sys.argv[1:3]
xlwb = xlApp.Workbooks.Open(filename, Password=password)
# xlwb = xlApp.Workbooks.Open(filename)
xlws = xlwb.Sheets(1) # counts from 1, not from 0
print xlws.Name
print xlws.Cells(1, 1) # that's A1

我不确定如何将信息传输到pandas数据帧。我是否需要逐个读取单元格,或者是否有方便的方法来实现?

5 个答案:

答案 0 :(得分:4)

假设起始单元格为(StartRow,StartCol),结束单元格为(EndRow,EndCol),我发现以下内容对我有用:

# Get the content in the rectangular selection region
# content is a tuple of tuples
content = xlws.Range(xlws.Cells(StartRow, StartCol), xlws.Cells(EndRow, EndCol)).Value 

# Transfer content to pandas dataframe
dataframe = pandas.DataFrame(list(content))

注意:Excel Cell B5在win32com中作为第5行,第2列给出。此外,我们需要list(...)从元组元组转换为元组列表,因为元组元组没有pandas.DataFrame构造函数。

答案 1 :(得分:1)

假设您可以使用win32com API将加密文件保存回磁盘(我意识到可能会失败),您可以立即调用顶级pandas函数read_excel。您需要首先安装xlrd(适用于Excel 2003),xlwt(也适用于2003)和openpyxl(适用于Excel 2007)的某些组合。 Here是用于读取Excel文件的文档。目前,pandas不支持使用win32com API读取Excel文件。如果您愿意,欢迎open up a GitHub issue

答案 2 :(得分:0)

来自David Hamann的网站(所有积分都归他所有) https://davidhamann.de/2018/02/21/read-password-protected-excel-files-into-pandas-dataframe/

使用xlwings,打开文件将首先启动Excel应用程序,以便输入密码。

import pandas as pd
import xlwings as xw

PATH = '/Users/me/Desktop/xlwings_sample.xlsx'
wb = xw.Book(PATH)
sheet = wb.sheets['sample']

df = sheet['A1:C4'].options(pd.DataFrame, index=False, header=True).value
df

答案 3 :(得分:0)

根据@ikeoddy提供的建议,应该将各个部分放在一起:

How to open a password protected excel file using python?

# Import modules
import pandas as pd
import win32com.client
import os
import getpass

# Name file variables
file_path = r'your_file_path'
file_name = r'your_file_name.extension'

full_name = os.path.join(file_path, file_name)
# print(full_name)

Getting command-line password input in Python

# You are prompted to provide the password to open the file
xl_app = win32com.client.Dispatch('Excel.Application')
pwd = getpass.getpass('Enter file password: ')

Workbooks.Open Method (Excel)

xl_wb = xl_app.Workbooks.Open(full_name, False, True, None, pwd)
xl_app.Visible = False
xl_sh = xl_wb.Worksheets('your_sheet_name')

# Get last_row
row_num = 0
cell_val = ''
while cell_val != None:
    row_num += 1
    cell_val = xl_sh.Cells(row_num, 1).Value
    # print(row_num, '|', cell_val, type(cell_val))
last_row = row_num - 1
# print(last_row)

# Get last_column
col_num = 0
cell_val = ''
while cell_val != None:
    col_num += 1
    cell_val = xl_sh.Cells(1, col_num).Value
    # print(col_num, '|', cell_val, type(cell_val))
last_col = col_num - 1
# print(last_col)

ikeoddy的答案:

content = xl_sh.Range(xl_sh.Cells(1, 1), xl_sh.Cells(last_row, last_col)).Value
# list(content)
df = pd.DataFrame(list(content[1:]), columns=content[0])
df.head()

python win32 COM closing excel workbook

xl_wb.Close(False)

答案 4 :(得分:0)

添加到@Maurice 答案以获取工作表中的所有单元格而无需指定范围

wb = xw.Book(PATH, password='somestring')
sheet = wb.sheets[0] #get first sheet

#sheet.used_range.address returns string of used range
df = sheet[sheet.used_range.address].options(pd.DataFrame, index=False, header=True).value