im试图收集枪支列表中的所有链接(在这种情况下为2页),并打印1)长度和2)链接本身。
我收到错误消息: 列表对象没有属性选择
from bs4 import BeautifulSoup
import requests
import csv
import pandas
from pandas import DataFrame
import re
import os
import locale
os.environ["PYTHONIOENCODING"] = "utf-8"
page = 1
all_links = []
url="https://www.gunstar.co.uk/view-trader/global-rifle-snipersystems/58782?page={}"
with requests.Session() as session:
while True:
print(url.format(page))
res=session.get(url.format(page))
soup=BeautifulSoup(res.content,'html.parser')
gun_details = soup.select('div.details')
for link in gun_details.select('a'):
all_links.append("https://www.gunstar.co.uk" + link['href'])
if len(soup.select(".nav_next"))==0:
break
page += 1
如果我从响应中删除.content,我得到的响应就没有len。
如果我在汤中添加.text,则选择('div.details')的结果与上述类似。
我确定我在一个相当简单的地方出错了,但是似乎看不到它-是否有一个原因,当试图点击html的特定部分时,select和findAll不起作用?
答案 0 :(得分:2)
您可以通过不同的方式从所有页面获得链接。这是使用发生器实现相同目的的一种方法:
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
link = "https://www.gunstar.co.uk/view-trader/global-rifle-snipersystems/58782"
base = "https://www.gunstar.co.uk"
def get_links(url):
res = requests.get(url)
soup = BeautifulSoup(res.text,'lxml')
for item in soup.select(".details > a"):
yield urljoin(base,item['href'])
next_page = soup.select_one(".gallery_navigation [rel='next']")
if next_page:
yield from get_links(next_page['href'])
if __name__ == '__main__':
list_of_links = [elem for elem in get_links(link)]
print(list_of_links)
答案 1 :(得分:1)
尝试以下代码。
from bs4 import BeautifulSoup
import requests
import csv
import pandas
from pandas import DataFrame
import re
import os
import locale
os.environ["PYTHONIOENCODING"] = "utf-8"
page = 1
url="https://www.gunstar.co.uk/view-trader/global-rifle-snipersystems/58782?page={}"
with requests.Session() as session:
while True:
all_links=[]
print(url.format(page))
res=session.get(url.format(page))
soup=BeautifulSoup(res.content,'html.parser')
gun_details = soup.select('div.details')
for link in gun_details:
all_links.append("https://www.gunstar.co.uk" + link.select_one('a')['href'])
print(all_links)
if len(soup.select(".nav_next"))==0:
break
page += 1
输出:
https://www.gunstar.co.uk/view-trader/global-rifle-snipersystems/58782?page=1
['https://www.gunstar.co.uk/mauser-m96-lightning-hunter-straight-pull-270-rifles/rifles/1083802', 'https://www.gunstar.co.uk/magtech-586-12-bore-gauge-pump-action/Shotguns/1083784', 'https://www.gunstar.co.uk/merkel-kr1-bolt-action-308-rifles/rifles/1083786', 'https://www.gunstar.co.uk/christensen-arms-r93-carbon-bolt-action-7-mm-rifles/rifles/1083788', 'https://www.gunstar.co.uk/voere-lbw-luxus-bolt-action-308-rifles/rifles/1083792', 'https://www.gunstar.co.uk/voere-2155-bolt-action-243-rifles/rifles/1083797', 'https://www.gunstar.co.uk/voere-2155-2155-synthetic-bolt-action-308-rifles/rifles/1083798', 'https://www.gunstar.co.uk/mauser-m96-lightning-hunter-straight-pull-7-mm-rifles/rifles/1083799', 'https://www.gunstar.co.uk/blaser-lrs2-straight-pull-308-rifles/rifles/1084397', 'https://www.gunstar.co.uk/remington-700-s-s-barrel-only-bolt-action-300-win-mag-rifles/rifles/1084432']
https://www.gunstar.co.uk/view-trader/global-rifle-snipersystems/58782?page=2
['https://www.gunstar.co.uk/pfeiffer-waffen-handy-hunter-sr2-single-shot-300-win-mag-rif/rifles/1084433', 'https://www.gunstar.co.uk/sabatti-10-22-mod-sporter-semi-auto-22-rifles/rifles/1084442', 'https://www.gunstar.co.uk/voere-lbw-m-sniper-rifle-bolt-action-308-rifles/rifles/1084454', 'https://www.gunstar.co.uk/snipersystems-zoom-gun-light-kit-lamping/Accessories/1130763']
获取所有链接的另一种方法。
from bs4 import BeautifulSoup
import requests
import csv
import pandas
from pandas import DataFrame
import re
import os
import locale
os.environ["PYTHONIOENCODING"] = "utf-8"
page = 1
all_links = []
url="https://www.gunstar.co.uk/view-trader/global-rifle-snipersystems/58782?page={}"
with requests.Session() as session:
while True:
print(url.format(page))
res=session.get(url.format(page))
soup=BeautifulSoup(res.content,'html.parser')
gun_details = soup.select('div.details > a')
for link in gun_details:
all_links.append("https://www.gunstar.co.uk" + link['href'])
if len(soup.select(".nav_next"))==0:
break
page += 1
print(all_links)