使用BS4从div提取文本时出现问题

时间:2020-07-13 21:44:38

标签: python beautifulsoup

我尝试使用bs4和json从此地址(https://www.catawiki.com/a/346823-japanese-antiques-auction-samurai)获取一些竞标项目的关键信息。我成功地从div词典中获取了大部分信息(如for循环中所列),但是,“价格”和“批次状态”不在同一位置。无论我对select,select_one,find,find_all做什么,这些值都不会显示在打印结果中,就好像它们在原始编码中不存在一样。我做错了什么? bs4是否可以执行div的极限深度?为什么价格会在源代码中显示出来,而不是在汤中显示出来?

这是我的代码,问题发生在最后两行:

from bs4 import BeautifulSoup
import requests
import json
page = requests.get(
    'https://www.catawiki.com/a/346823-japanese-antiques-auction-samurai'
)
soup = BeautifulSoup(page.content, 'html.parser')
container = soup.find('main', class_='u-col-9-12 u-move-4-12 u-col-6-9-m u-move-4-9-m')

# extracting the value of 'results'

data_prop = json.loads(container.select_one("div.be-lot-list__loader")['data-props'])
result = data_prop.get('results')

# selecting items from dictionary and attributing values to each one of them
for i in range(len(result)):
    ids = result[i]['id']
    titles = result[i]['title']
    subtitles = result[i]['subtitle']
    favoriteCounts = result[i]['favoriteCount']
    auctionIds = result[i]['auctionId']
    biddingStartTimes = result[i]['biddingStartTime']

# abstracting prices and lot status

prince_lot = container.find_all('div', class_='be-lot__price u-placeholder')
print(prince_lot) 

1 个答案:

答案 0 :(得分:0)

价格和批次状态是从外部URL https://www.catawiki.com/buyer/api/v1/lots/live?ids=加载的,其中ids=是页面上项目的逗号分隔ID。

例如:

from bs4 import BeautifulSoup
import requests
import json

page = requests.get('https://www.catawiki.com/a/346823-japanese-antiques-auction-samurai')
soup = BeautifulSoup(page.content, 'html.parser')
container = soup.find('main', class_='u-col-9-12 u-move-4-12 u-col-6-9-m u-move-4-9-m')

# extracting the value of 'results'
data_prop = json.loads(container.select_one("div.be-lot-list__loader")['data-props'])
results = data_prop.get('results')

lots = requests.get('https://www.catawiki.com/buyer/api/v1/lots/live?ids=' + ','.join(str(result['id']) for result in results)).json()
lots = {l['id']: l for l in lots['lots']}

# uncomment to see all data:
# print(json.dumps(lots, indent=4))

for result in results:
    ids = result['id']
    titles = result['title']
    subtitles = result['subtitle']
    favoriteCounts = result['favoriteCount']
    auctionIds = result['auctionId']
    biddingStartTimes = result['biddingStartTime']

    price = lots[int(ids)]['current_bid_amount']['EUR']
    closed = lots[int(ids)]['closed']

    print(titles)
    print('Price:', price)
    print('Closed:', closed)
    print('-' * 80)

打印:

Katana - Tamahagane stem -  肥前国住廣任 Hizen kuni ju Hiroto met een NBTHK Hozon certificaat. - Japan - early Edo period, 17th century.
Price: 3700.0
Closed: True
--------------------------------------------------------------------------------
Yoroi (1) - Leather - Samurai - Japan - Meiji period (1868-1912)
Price: 2800.0
Closed: True
--------------------------------------------------------------------------------
Katana, Sword - Tamahagane steel - Rare KIKU engraved- NBTHK - 68cm Nagasa - Japan - 17th century
Price: 11000.0
Closed: True
--------------------------------------------------------------------------------
Tsuba - Iron - Benkei and Mii-Dera Bell - Japan - Edo Period (1600-1868)
Price: 1100
Closed: True
--------------------------------------------------------------------------------
Wakizashi - Steel - Mumei , toegeschreven aan 1e gen. Bizen Yokoyama Sukekane, Hoge kwaliteit montering ! - Japan - ca. 1850
Price: 2400.0
Closed: True
--------------------------------------------------------------------------------
Fuchikashira - Copper, Gold, Shakudo - Japan - Edo Period (1600-1868)
Price: 270
Closed: True
--------------------------------------------------------------------------------
Tsuba - Iron - Flowers - NBTHK Tokubetsu Kichio - Japan - Edo Period (1600-1868)
Price: 340.0
Closed: True
--------------------------------------------------------------------------------
Mengu/ Menpo - Lacquered metal - Lacquered metal facial samurai mask (menpo), with four-piece gorget (nodowa) with blue cords. - Japan - Meiji period (1868-1912)
Price: 390.0
Closed: True
--------------------------------------------------------------------------------

...and so on.