我正在尝试使用BeautifulSoup中的以下代码来抓取下面的页面
import requests
from urllib.request import urlopen
from bs4 import BeautifulSoup
import lxml
url = 'https://remittanceprices.worldbank.org/en/corridor/Australia/China'
page=urlopen(url)
bs = BeautifulSoup(page,"lxml")
print(bs.get_text())
all_links=bs.find_all("div", {"class":"views-field views-field-title" })
for link in all_links:
content=link.get_text()
print (content)
all_links=bs.find_all("div", {"class":"mobile-header" })
for link in all_links:
content=link.get_text()
print (content)
您能否提供一些指导,以下列格式打印/提取所有公司的数据
Firm|product|Fee|Exchange rate margin(%)|Total Cost Percent(%)|Total Cost(AUD)
Bank of China|28.00|5.77|19.77|39.54
ANZ Bank|32.00|4.39|20.39|40.78
此致 -Abacus
答案 0 :(得分:1)
import requests
from bs4 import BeautifulSoup
url = 'https://remittanceprices.worldbank.org/en/corridor/Australia/China'
r = requests.get(url,verify=False)
soup = BeautifulSoup(r.text,'lxml')
rows = [i.get_text("|").split("|") for i in soup.select('#tab-1 .corridor-row')]
for row in rows:
#a,b,c,d,e = row[2],row[15],row[18],row[21],row[25]
#print(a,b,c,d,e,sep='|')
print('{0[2]}|{0[15]}|{0[18]}|{0[21]}|{0[25]}'.format(row))
Citibank|0.00|1.53|1.53|3.06
Transferwise|5.05|-0.04|2.48|4.96
Western Union|5.00|1.19|3.69|7.38
MoneyGram|8.00|1.06|5.06|10.12
WorldRemit|7.99|1.30|5.30|10.60
Ria|10.00|0.84|5.84|11.68
Ceylon Exchange|10.00|1.37|6.37|12.74
Western Union|9.95|1.69|6.66|13.32
Orbit Remit|13.00|0.78|7.28|14.56
Money2anywhere|12.00|1.71|7.71|15.42
SUPAY|18.00|-1.24|7.76|15.52
Money Chain Foreign Exchange|18.00|-1.12|7.88|15.76
MoneyGram|15.00|1.30|8.80|17.60
Commonwealth Bank|22.00|3.43|14.43|28.86
Bank of China|28.00|1.50|15.50|31.00
ANZ Bank|24.00|4.51|16.51|33.02
National Australia Bank (NAB)|22.00|5.74|16.74|33.48
Bank of China|32.00|1.50|17.50|35.00
Commonwealth Bank|30.00|3.43|18.43|36.86
ANZ Bank|32.00|4.51|20.51|41.02
National Australia Bank (NAB)|30.00|5.74|20.74|41.48