我尝试使用bs4和json从此地址(https://www.catawiki.com/a/346823-japanese-antiques-auction-samurai)获取一些竞标项目的关键信息。我成功地从div词典中获取了大部分信息(如for循环中所列),但是,“价格”和“批次状态”不在同一位置。无论我对select,select_one,find,find_all做什么,这些值都不会显示在打印结果中,就好像它们在原始编码中不存在一样。我做错了什么? bs4是否可以执行div的极限深度?为什么价格会在源代码中显示出来,而不是在汤中显示出来?
这是我的代码,问题发生在最后两行:
from bs4 import BeautifulSoup
import requests
import json
page = requests.get(
'https://www.catawiki.com/a/346823-japanese-antiques-auction-samurai'
)
soup = BeautifulSoup(page.content, 'html.parser')
container = soup.find('main', class_='u-col-9-12 u-move-4-12 u-col-6-9-m u-move-4-9-m')
# extracting the value of 'results'
data_prop = json.loads(container.select_one("div.be-lot-list__loader")['data-props'])
result = data_prop.get('results')
# selecting items from dictionary and attributing values to each one of them
for i in range(len(result)):
ids = result[i]['id']
titles = result[i]['title']
subtitles = result[i]['subtitle']
favoriteCounts = result[i]['favoriteCount']
auctionIds = result[i]['auctionId']
biddingStartTimes = result[i]['biddingStartTime']
# abstracting prices and lot status
prince_lot = container.find_all('div', class_='be-lot__price u-placeholder')
print(prince_lot)
答案 0 :(得分:0)
价格和批次状态是从外部URL https://www.catawiki.com/buyer/api/v1/lots/live?ids=
加载的,其中ids=
是页面上项目的逗号分隔ID。
例如:
from bs4 import BeautifulSoup
import requests
import json
page = requests.get('https://www.catawiki.com/a/346823-japanese-antiques-auction-samurai')
soup = BeautifulSoup(page.content, 'html.parser')
container = soup.find('main', class_='u-col-9-12 u-move-4-12 u-col-6-9-m u-move-4-9-m')
# extracting the value of 'results'
data_prop = json.loads(container.select_one("div.be-lot-list__loader")['data-props'])
results = data_prop.get('results')
lots = requests.get('https://www.catawiki.com/buyer/api/v1/lots/live?ids=' + ','.join(str(result['id']) for result in results)).json()
lots = {l['id']: l for l in lots['lots']}
# uncomment to see all data:
# print(json.dumps(lots, indent=4))
for result in results:
ids = result['id']
titles = result['title']
subtitles = result['subtitle']
favoriteCounts = result['favoriteCount']
auctionIds = result['auctionId']
biddingStartTimes = result['biddingStartTime']
price = lots[int(ids)]['current_bid_amount']['EUR']
closed = lots[int(ids)]['closed']
print(titles)
print('Price:', price)
print('Closed:', closed)
print('-' * 80)
打印:
Katana - Tamahagane stem - 肥前国住廣任 Hizen kuni ju Hiroto met een NBTHK Hozon certificaat. - Japan - early Edo period, 17th century.
Price: 3700.0
Closed: True
--------------------------------------------------------------------------------
Yoroi (1) - Leather - Samurai - Japan - Meiji period (1868-1912)
Price: 2800.0
Closed: True
--------------------------------------------------------------------------------
Katana, Sword - Tamahagane steel - Rare KIKU engraved- NBTHK - 68cm Nagasa - Japan - 17th century
Price: 11000.0
Closed: True
--------------------------------------------------------------------------------
Tsuba - Iron - Benkei and Mii-Dera Bell - Japan - Edo Period (1600-1868)
Price: 1100
Closed: True
--------------------------------------------------------------------------------
Wakizashi - Steel - Mumei , toegeschreven aan 1e gen. Bizen Yokoyama Sukekane, Hoge kwaliteit montering ! - Japan - ca. 1850
Price: 2400.0
Closed: True
--------------------------------------------------------------------------------
Fuchikashira - Copper, Gold, Shakudo - Japan - Edo Period (1600-1868)
Price: 270
Closed: True
--------------------------------------------------------------------------------
Tsuba - Iron - Flowers - NBTHK Tokubetsu Kichio - Japan - Edo Period (1600-1868)
Price: 340.0
Closed: True
--------------------------------------------------------------------------------
Mengu/ Menpo - Lacquered metal - Lacquered metal facial samurai mask (menpo), with four-piece gorget (nodowa) with blue cords. - Japan - Meiji period (1868-1912)
Price: 390.0
Closed: True
--------------------------------------------------------------------------------
...and so on.