谁能帮我进行python网站抓取,下面是代码

时间:2019-02-20 07:17:43

标签: python html-table

import requests
from bs4 import BeautifulSoup
html = requests.get('https://www.bacb.com/services/o.php?page=101127&by=state&state=CA&pagenum=3').text
soup = BeautifulSoup(html, 'lxml')
type(soup)
print(soup.prettify())
table_rows = table.find_all('tr')
for tr in table_rows:
    td = tr.find_all('td')
    row = [i.text for i in td]
    print(row)

2 个答案:

答案 0 :(得分:0)

稍后您需要在数据上使用正则表达式

尝试

import requests 
from bs4 import BeautifulSoup 

html = requests.get('https://www.bacb.com/services/o.php? 
page=101127&by=state&state=CA&pagenum=3').text 
soup = BeautifulSoup(html, 'html.parser')
table_rows = soup.find_all('tr') 

for tr in table_rows: 
    td = tr.find_all('td') 
    row = [i.text for i in td] 
    print(row)

答案 1 :(得分:0)

您的代码正确。除了您使用“表格”而不是“汤”(第6行)。

number_format()