我有很多HTML文件,我需要打开它们或将它们导入到单个Excel工作簿中,然后简单地保存工作簿。每个HTML文件都应位于工作簿中其自己的工作表上。
我现有的代码不起作用,它在workbook.Open(html)
行上崩溃了,很可能在随后的行上崩溃。在此主题的网上搜索找不到任何内容。
import win32com.client as win32
import pathlib as path
def save_html_files_to_worksheets(read_directory):
read_path = path.Path(read_directory)
save_path = read_path.joinpath('Single_Workbook_Containing_HTML_Files.xlsx')
excel_app = win32.gencache.EnsureDispatch('Excel.Application')
workbook = excel_app.Workbooks.Add() # create a new excel workbook
indx = 1 # used to add new worksheets dependent on number of html files
for html in read_path.glob('*.html'): # loop through directory getting html files
workbook.Open(html) # open the html in the newly created workbook - this doesn't work though
worksheet = workbook.Worksheets(indx) # each iteration in loop add new worksheet
worksheet.Name = 'Test' + str(indx) # name added worksheets
indx += 1
workbook.SaveAs(str(save_path), 51) # win32com requires string like path, 51 is xlsx extension
excel_app.Application.Quit()
save_html_files_to_worksheets(r'C:\Users\<UserName>\Desktop\HTML_FOLDER')
如果有帮助,以下代码可以满足我的需求。它将把每个HTML文件转换成一个单独的Excel文件。我需要一个包含多个WorkSheets的Excel文件中的每个HTML文件。
import win32com.client as win32
import pathlib as path
def save_as_xlsx(read_directory):
read_path = path.Path(read_directory)
excel_app = win32.gencache.EnsureDispatch('Excel.Application')
for html in read_path.glob('*.html'):
save_path = read_path.joinpath(html.stem + '.xlsx')
wb = excel_app.Workbooks.Open(html)
wb.SaveAs(str(save_path), 51)
excel_app.Application.Quit()
save_as_xlsx(r'C:\Users\<UserName>\Desktop\HTML_FOLDER')
这是您可以使用的示例HTML文件的链接,该文件中的数据不是真实的:HTML Download Link
答案 0 :(得分:2)
一种解决方案是将HTML文件打开到一个临时工作簿中,然后从该工作簿中复制工作表到包含所有工作簿的工作簿中:
workbook = excel_app.Application.Workbooks.Add()
sheet = workbook.Sheets(1)
for path in read_path.glob('*.html'):
workbook_tmp = excel_app.Application.Workbooks.Open(path)
workbook_tmp.Sheets(1).Copy(Before=sheet)
workbook_tmp.Close()
# Remove the redundant 'Sheet1'
excel_app.Application.ShowAlerts = False
sheet.Delete()
excel_app.Application.ShowAlerts = True
答案 1 :(得分:0)
我相信with tmp_dob as (
select to_date('19900101', 'YYYYMMDD') as Birthday from dual union all
select to_date('19901231', 'YYYYMMDD') Birthday from dual union all
select to_date('20040229', 'YYYYMMDD') Birthday from dual union all
select to_date('20041231', 'YYYYMMDD') Birthday from dual union all
select to_date('20171231', 'YYYYMMDD') Birthday from dual union all
select to_date('20051231', 'YYYYMMDD') Birthday from dual
)
select Birthday,
add_months(birthday, 12 * (extract(year from sysdate) - extract(year from birthday)))
from tmp_dob;
将使您的工作更加轻松。
pandas
这里是一个示例,说明如何从Wikipedia html获取多个表并将其输入到Pandas DataFrame中并将其保存到磁盘。
pip install pandas
对于您的用例,应该可以执行以下操作:
import pandas as pd
url = "https://en.wikipedia.org/wiki/List_of_American_films_of_2017"
wikitables = pd.read_html(url, header=0, attrs={"class":"wikitable"})
for idx,df in enumerate(wikitables):
df.to_csv('{}.csv'.format(idx),index=False)
**确保在import pathlib as path
import pandas as pd
def save_as_xlsx(read_directory):
read_path = path.Path(read_directory)
for html in read_path.glob('*.html'):
save_path = read_path.joinpath(html.stem + '.xlsx')
dfs_from_html = pd.read_html(html, header=0,)
for idx, df in enumerate(dfs_from_html):
df.to_excel('{}.xlsx'.format(idx),index=False)
函数中设置正确的html属性。
答案 2 :(得分:-1)
怎么样?
Sub From_XML_To_XL()
'UpdatebyKutoolsforExcel20151214
Dim xWb As Workbook
Dim xSWb As Workbook
Dim xStrPath As String
Dim xFileDialog As FileDialog
Dim xFile As String
Dim xCount As Long
On Error GoTo ErrHandler
Set xFileDialog = Application.FileDialog(msoFileDialogFolderPicker)
xFileDialog.AllowMultiSelect = False
xFileDialog.Title = "Select a folder [Kutools for Excel]"
If xFileDialog.Show = -1 Then
xStrPath = xFileDialog.SelectedItems(1)
End If
If xStrPath = "" Then Exit Sub
Application.ScreenUpdating = False
Set xSWb = ThisWorkbook
xCount = 1
xFile = Dir(xStrPath & "\*.xml")
Do While xFile <> ""
Set xWb = Workbooks.OpenXML(xStrPath & "\" & xFile)
xWb.Sheets(1).UsedRange.Copy xSWb.Sheets(1).Cells(xCount, 1)
xWb.Close False
xCount = xSWb.Sheets(1).UsedRange.Rows.Count + 2
xFile = Dir()
Loop
Application.ScreenUpdating = True
xSWb.Save
Exit Sub
ErrHandler:
MsgBox "no files xml", , "Kutools for Excel"
End Sub