如何使用美丽汤从网页上抓取表格

时间:2018-10-22 09:07:15

标签: python web-scraping beautifulsoup request

最近,我正在从网页上抓取表格数据,并希望以类似的列表格式[country_name,currency1,currency2]存储在数据库中。 我尝试了下面的代码,但是有点奏效,但并不完全。请查看它并指导我,以及如何从网页中抓取多个表格。 我的代码是:

from bs4 import BeautifulSoup
import requests

url = 'https://www.x-rates.com/table/?from=USD&amount=1'
page = requests.get(url)

soup = BeautifulSoup(page.text, 'html.parser')

soup.prettify()

table = soup.find("table")

for row in table.findAll('td'):
    #print(row)
     row = row.text
     print(row)

推荐的输出是:

  

[[欧元,0.868499,1.151412],[英镑,0.766371,1.304850],......]

2 个答案:

答案 0 :(得分:2)

您可以尝试以下方法:

import requests, sqlite3
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://www.x-rates.com/table/?from=USD&amount=1').text, 'html.parser')
tables = [i for i in d.find_all('div', {'class':'moduleContent'}) if i.find_all('table')]
new_results = [[[c.text for c in (j.find_all('td') if j.find('td') else j.find_all('th'))] for j in i.find_all('tr')] for i in tables][0]
a, b = [i for i, [a, *_] in enumerate(new_results) if a == 'US Dollar']
_table1, _table2 = new_results[a:b], new_results[b:]

输出:

[['US Dollar', '1.00 USD', 'inv. 1.00 USD'], ['Euro', '0.871476', '1.147479'], ['British Pound', '0.770649', '1.297607'], ['Indian Rupee', '73.536237', '0.013599'], ['Australian Dollar', '1.409996', '0.709222'], ['Canadian Dollar', '1.311334', '0.762582'], ['Singapore Dollar', '1.379420', '0.724942'], ['Swiss Franc', '0.996487', '1.003526'], ['Malaysian Ringgit', '4.160462', '0.240358'], ['Japanese Yen', '112.733613', '0.008870'], ['Chinese Yuan Renminbi', '6.946052', '0.143967']]
[['US Dollar', '1.00 USD', 'inv. 1.00 USD'], ['Argentine Peso', '36.544638', '0.027364'], ['Australian Dollar', '1.409996', '0.709222'], ['Bahraini Dinar', '0.376000', '2.659574'], ['Botswana Pula', '10.648150', '0.093913'], ['Brazilian Real', '3.681469', '0.271631'], ['Bruneian Dollar', '1.379420', '0.724942'], ['Bulgarian Lev', '1.704458', '0.586697'], ['Canadian Dollar', '1.311334', '0.762582'], ['Chilean Peso', '680.032262', '0.001471'], ['Chinese Yuan Renminbi', '6.946052', '0.143967'], ['Colombian Peso', '3083.558854', '0.000324'], ['Croatian Kuna', '6.479910', '0.154323'], ['Czech Koruna', '22.521913', '0.044401'], ['Danish Krone', '6.501631', '0.153808'], ['Euro', '0.871476', '1.147479'], ['Hong Kong Dollar', '7.839297', '0.127562'], ['Hungarian Forint', '281.579342', '0.003551'], ['Icelandic Krona', '118.017224', '0.008473'], ['Indian Rupee', '73.536237', '0.013599'], ['Indonesian Rupiah', '15193.787932', '0.000066'], ['Iranian Rial', '42000.224971', '0.000024'], ['Israeli Shekel', '3.663392', '0.272971'], ['Japanese Yen', '112.733613', '0.008870'], ['Kazakhstani Tenge', '364.653789', '0.002742'], ['South Korean Won', '1132.406237', '0.000883'], ['Kuwaiti Dinar', '0.303762', '3.292053'], ['Libyan Dinar', '1.376491', '0.726485'], ['Malaysian Ringgit', '4.160462', '0.240358'], ['Mauritian Rupee', '34.652436', '0.028858'], ['Mexican Peso', '19.345938', '0.051690'], ['Nepalese Rupee', '118.209501', '0.008460'], ['New Zealand Dollar', '1.522915', '0.656636'], ['Norwegian Krone', '8.261674', '0.121041'], ['Omani Rial', '0.384500', '2.600780'], ['Pakistani Rupee', '133.126431', '0.007512'], ['Philippine Peso', '53.821565', '0.018580'], ['Polish Zloty', '3.739851', '0.267390'], ['Qatari Riyal', '3.640000', '0.274725'], ['Romanian New Leu', '4.066770', '0.245895'], ['Russian Ruble', '65.251561', '0.015325'], ['Saudi Arabian Riyal', '3.750000', '0.266667'], ['Singapore Dollar', '1.379420', '0.724942'], ['South African Rand', '14.283878', '0.070009'], ['Sri Lankan Rupee', '172.591703', '0.005794'], ['Swedish Krona', '9.000482', '0.111105'], ['Swiss Franc', '0.996487', '1.003526'], ['Taiwan New Dollar', '30.933465', '0.032327'], ['Thai Baht', '32.778995', '0.030507'], ['Trinidadian Dollar', '6.749499', '0.148159'], ['Turkish Lira', '5.681995', '0.175995'], ['Emirati Dirham', '3.672500', '0.272294'], ['British Pound', '0.770649', '1.297607'], ['Venezuelan Bolivar', '9.987500', '0.100125']]

要将数据插入数据库:

conn = sqlite3.connect('results.db')
conn.execute("CREATE TABLE data (dollar text, usd text, inv_usd text)")
conn.executemany("INSERT INTO data VALUES (?, ?, ?)", _table1+_table2)
conn.commit()
conn.close()

答案 1 :(得分:0)

我做了一些更改。希望这会有所帮助。

from bs4 import BeautifulSoup
import requests

url = 'https://www.x-rates.com/table/?from=USD&amount=1'
page = requests.get(url)

soup = BeautifulSoup(page.text, 'html.parser')

soup.prettify()

table = soup.find("table", {"class": "ratesTable"})
body = table.find('tbody')

data = []

for row in body.findAll('tr'):
    rowArr = []
    for td in row.findAll('td'):
        rowArr.append(td.text)
    data.append(rowArr)

print(data)