使用python从电子商务网站抓取产品的特定信息字段时出错?

时间:2019-07-24 14:37:58

标签: python python-3.x selenium web-scraping python-requests

我尝试从电子商务商店获取详细信息产品。我已经成功地从搜索页面中抓取信息,但是当我尝试从产品详细信息中抓取时,却失败了。我想尝试使用硒,但我不明白。如何解决?

我尝试使用python 3.6和请求进行抓取

这是我的代码

import requests

headers = {
    'User-Agent': 'Mozilla/5',
    'Referer': 'https://shopee.com.my/search?keyword=ws331c'
}

url = 'https://shopee.co.id/api/v2/search_items/?by=relevancy&keyword=ws331c&limit=50&newest=0&order=desc&page_type=search'  
r = requests.get(url, headers = headers).json()

itemid = []
shopid = []
for item in r['items']:
    #print(item['itemid'], ' ', item['shopid'] ' ', item['price']/100000)
    itemid.append(item['itemid'])
    shopid.append(item['shopid'])

print(shopid[0])

url = ('https://shopee.co.id/api/v2/item/get?itemid={}&shopid={}').format(itemid[0], shopid[0])  
p = requests.get(url, headers = headers).json()

for detail in p['item']:
    print(detail['itemid'])

我想从产品详细信息中获取商品ID,但输出为

Traceback (most recent call last): File "test.py", line 24, in <module> print(detail['itemid']) TypeError: string indices must be integers

1 个答案:

答案 0 :(得分:0)

对我来说效果很好。我使用shopee作为试用版。

import requests

headers = {
    'User-Agent': 'Mozilla/5',
    'Referer': 'https://shopee.com.my/search?keyword=ws331c'
}

url = 'https://shopee.co.id/api/v2/search_items/?by=relevancy&keyword=ws331c&limit=50&newest=0&order=desc&page_type=search'  
r = requests.get(url, headers = headers).json()

itemid = []
shopid = []
for item in r['items']:
    #print(item['itemid'], ' ', item['shopid'] ' ', item['price']/100000)
    itemid.append(item['itemid'])
    shopid.append(item['shopid'])

print('item_id, price, sold, rating')
for i in range(len(itemid)):
    url = ('https://shopee.co.id/api/v2/item/get?itemid={}&shopid={}').format(itemid[i], shopid[i])  
    p = requests.get(url, headers = headers).json()

    print(p['item']['itemid'], ' ', p['item']['price']/100000, ' ', p['item']['historical_sold'], p['item']['item_rating']['rating_star'])