Question

我是新手，我正在尝试从html表中提取数据并将其另存为csv文件。我该怎么办？

这是我到目前为止所做的：

from bs4 import BeautifulSoup
import os
os.chdir('/Users/adityavemuganti/Downloads/Accounts_Monthly_Data-June2018')
soup=BeautifulSoup(open('Prod224_0055_00007464_20170930.html'),"html.parser")
Format=soup.prettify()
table=soup.find("table",attrs={"class":"details"})

这是我要从中抓取的html文件：

http://download.companieshouse.gov.uk/Accounts_Bulk_Data-2019-08-03.zip（这是一个zip文件）。我已经解压缩了zip文件，并将内容读入“汤”，如上所述。现在，我试图将标签中的数据读取为csv / xlsx格式。

Answer 1

熊猫是去这里的路。 read_html和to_csv，或者如果您愿意，也可以输出到xlsx to_excel。

import pandas as pd

dataframes = pd.read_html('yoururlhere')
# Assuming there is only one table in the file, if not then you may need to do a little more digging
df = dataframes[0]

df.to_csv('filename.csv')

需要从html表中提取数据

1 个答案: