我只是Python的初学者。
我正在尝试从网站上抓取数据,并设法编写了以下代码。
但是,由于无法获取href
标签,因此我不确定如何继续进行操作,因此无法转到每个列表并获取数据。我对HTML标签也不太了解,因此怀疑我没有正确识别这些标签。
这是我的代码:
import requests
from bs4 import BeautifulSoup
urls = []
for i in range(1,5):
pages = "https://directory.singaporefintech.org/?p={0}&category=0&zoom=15&is_mile=0&directory_radius=0&view=list&hide_searchbox=0&hide_nav=0&hide_nav_views=0&hide_pager=0&featured_only=0&feature=1&perpage=20&sort=random".format(i)
urls.append(pages)
Data = []
for info in urls:
page = requests.get(info)
soup = BeautifulSoup(page.content, 'html.parser')
links = soup.find_all('a', attrs ={'class' :'sabai-directory-title'})
hrefs = [link['href'] for link in links]
上面的代码将href生成为空白列表。 任何帮助将不胜感激!
谢谢!
答案 0 :(得分:0)
代码很好,您正在寻找的类在那些页面上不存在。例如,检查https://directory.singaporefintech.org/hello-world/?category=0&zoom=15&is_mile=0&directory_radius=0&view=list&hide_searchbox=0&hide_nav=0&hide_nav_views=0&hide_pager=0&featured_only=0&feature=1&perpage=20&sort=random后,用注释-回复-链接替换sabai-directory-title类,并在添加打印语句时得到结果
答案 1 :(得分:0)
嗨,我对代码做了一些更改:
import requests
from bs4 import BeautifulSoup
from pprint import pprint
urls = []
for i in range(1,5):
pages = "https://directory.singaporefintech.org"
urls.append(pages)
Data = []
hrefs = []
for info in urls:
page = requests.get(info)
soup = BeautifulSoup(page.content, 'html.parser')
links = soup.find_all('div', attrs ={'class' :'sabai-directory-title'})
for link in links:
Data.extend([a['href'].encode('ascii') for a in link.find_all('a', href=True) if a.text])
pprint (Data)
输出:
['https://directory.singaporefintech.org/directory/listing/silent-eight',
'https://directory.singaporefintech.org/directory/listing/moolahsense',
'https://directory.singaporefintech.org/directory/listing/myfinb',
'https://directory.singaporefintech.org/directory/listing/wefinance',
'https://directory.singaporefintech.org/directory/listing/quber',
'https://directory.singaporefintech.org/directory/listing/ayondo-asia-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/ceo-1',
'https://directory.singaporefintech.org/directory/listing/acekards',
'https://directory.singaporefintech.org/directory/listing/paper-ink-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/alpha-payments-cloud',
'https://directory.singaporefintech.org/directory/listing/samurai-fintech-singapore-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/corris-asset-management-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/fundmylife',
'https://directory.singaporefintech.org/directory/listing/mooments',
'https://directory.singaporefintech.org/directory/listing/venture-capital-network-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/junotele_',
'https://directory.singaporefintech.org/directory/listing/mobilecover',
'https://directory.singaporefintech.org/directory/listing/cherrypay',
'https://directory.singaporefintech.org/directory/listing/toast',
'https://directory.singaporefintech.org/directory/listing/cashdab',
'https://directory.singaporefintech.org/directory/listing/silent-eight',
'https://directory.singaporefintech.org/directory/listing/moolahsense',
'https://directory.singaporefintech.org/directory/listing/myfinb',
'https://directory.singaporefintech.org/directory/listing/wefinance',
'https://directory.singaporefintech.org/directory/listing/quber',
'https://directory.singaporefintech.org/directory/listing/ayondo-asia-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/ceo-1',
'https://directory.singaporefintech.org/directory/listing/acekards',
'https://directory.singaporefintech.org/directory/listing/paper-ink-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/alpha-payments-cloud',
'https://directory.singaporefintech.org/directory/listing/samurai-fintech-singapore-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/corris-asset-management-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/fundmylife',
'https://directory.singaporefintech.org/directory/listing/mooments',
'https://directory.singaporefintech.org/directory/listing/venture-capital-network-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/junotele_',
'https://directory.singaporefintech.org/directory/listing/mobilecover',
'https://directory.singaporefintech.org/directory/listing/cherrypay',
'https://directory.singaporefintech.org/directory/listing/toast',
'https://directory.singaporefintech.org/directory/listing/cashdab',
'https://directory.singaporefintech.org/directory/listing/silent-eight',
'https://directory.singaporefintech.org/directory/listing/moolahsense',
'https://directory.singaporefintech.org/directory/listing/myfinb',
'https://directory.singaporefintech.org/directory/listing/wefinance',
'https://directory.singaporefintech.org/directory/listing/quber',
'https://directory.singaporefintech.org/directory/listing/ayondo-asia-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/ceo-1',
'https://directory.singaporefintech.org/directory/listing/acekards',
'https://directory.singaporefintech.org/directory/listing/paper-ink-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/alpha-payments-cloud',
'https://directory.singaporefintech.org/directory/listing/samurai-fintech-singapore-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/corris-asset-management-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/fundmylife',
'https://directory.singaporefintech.org/directory/listing/mooments',
'https://directory.singaporefintech.org/directory/listing/venture-capital-network-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/junotele_',
'https://directory.singaporefintech.org/directory/listing/mobilecover',
'https://directory.singaporefintech.org/directory/listing/cherrypay',
'https://directory.singaporefintech.org/directory/listing/toast',
'https://directory.singaporefintech.org/directory/listing/cashdab',
'https://directory.singaporefintech.org/directory/listing/silent-eight',
'https://directory.singaporefintech.org/directory/listing/moolahsense',
'https://directory.singaporefintech.org/directory/listing/myfinb',
'https://directory.singaporefintech.org/directory/listing/wefinance',
'https://directory.singaporefintech.org/directory/listing/quber',
'https://directory.singaporefintech.org/directory/listing/ayondo-asia-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/ceo-1',
'https://directory.singaporefintech.org/directory/listing/acekards',
'https://directory.singaporefintech.org/directory/listing/paper-ink-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/alpha-payments-cloud',
'https://directory.singaporefintech.org/directory/listing/samurai-fintech-singapore-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/corris-asset-management-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/fundmylife',
'https://directory.singaporefintech.org/directory/listing/mooments',
'https://directory.singaporefintech.org/directory/listing/venture-capital-network-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/junotele_',
'https://directory.singaporefintech.org/directory/listing/mobilecover',
'https://directory.singaporefintech.org/directory/listing/cherrypay',
'https://directory.singaporefintech.org/directory/listing/toast',
'https://directory.singaporefintech.org/directory/listing/cashdab']
这是您期望的数据输出吗?
希望有帮助!
答案 2 :(得分:0)
您可以使用CSS选择器来剪贴链接。选择器div.sabai-directory-title a
将在<a>
标签内的类<div>
中找到任何sabai-directory-title
标签(我更新了URL,您给了我错误页面):
from bs4 import BeautifulSoup
import requests
from pprint import pprint
r = requests.get('https://directory.singaporefintech.org/')
soup = BeautifulSoup(r.text, 'lxml')
hrefs = [a['href'] for a in soup.select('div.sabai-directory-title a')]
pprint(hrefs)
这将打印:
['https://directory.singaporefintech.org/directory/listing/silent-eight',
'https://directory.singaporefintech.org/directory/listing/incomlend',
'https://directory.singaporefintech.org/directory/listing/bizgrow',
'https://directory.singaporefintech.org/directory/listing/makerscut',
'https://directory.singaporefintech.org/directory/listing/soho-fintech',
'https://directory.singaporefintech.org/directory/listing/dxmarkets',
'https://directory.singaporefintech.org/directory/listing/fundrevo',
'https://directory.singaporefintech.org/directory/listing/money4money',
'https://directory.singaporefintech.org/directory/listing/onelyst',
'https://directory.singaporefintech.org/directory/listing/hearti-lab',
'https://directory.singaporefintech.org/directory/listing/samurai-fintech-singapore-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/ceo-1',
'https://directory.singaporefintech.org/directory/listing/arcadier',
'https://directory.singaporefintech.org/directory/listing/plmp-fintech-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/cash-in-asia',
'https://directory.singaporefintech.org/directory/listing/grc-systems',
'https://directory.singaporefintech.org/directory/listing/sendexpense',
'https://directory.singaporefintech.org/directory/listing/jinjerjade',
'https://directory.singaporefintech.org/directory/listing/hatcher',
'https://directory.singaporefintech.org/directory/listing/fintech-consortium']