我尝试从电子商务商店获取详细信息产品。我已经成功地从搜索页面中抓取信息,但是当我尝试从产品详细信息中抓取时,却失败了。我想尝试使用硒,但我不明白。如何解决?
我尝试使用python 3.6和请求进行抓取
这是我的代码
import requests
headers = {
'User-Agent': 'Mozilla/5',
'Referer': 'https://shopee.com.my/search?keyword=ws331c'
}
url = 'https://shopee.co.id/api/v2/search_items/?by=relevancy&keyword=ws331c&limit=50&newest=0&order=desc&page_type=search'
r = requests.get(url, headers = headers).json()
itemid = []
shopid = []
for item in r['items']:
#print(item['itemid'], ' ', item['shopid'] ' ', item['price']/100000)
itemid.append(item['itemid'])
shopid.append(item['shopid'])
print(shopid[0])
url = ('https://shopee.co.id/api/v2/item/get?itemid={}&shopid={}').format(itemid[0], shopid[0])
p = requests.get(url, headers = headers).json()
for detail in p['item']:
print(detail['itemid'])
我想从产品详细信息中获取商品ID,但输出为
Traceback (most recent call last):
File "test.py", line 24, in <module>
print(detail['itemid'])
TypeError: string indices must be integers
答案 0 :(得分:0)
对我来说效果很好。我使用shopee作为试用版。
import requests
headers = {
'User-Agent': 'Mozilla/5',
'Referer': 'https://shopee.com.my/search?keyword=ws331c'
}
url = 'https://shopee.co.id/api/v2/search_items/?by=relevancy&keyword=ws331c&limit=50&newest=0&order=desc&page_type=search'
r = requests.get(url, headers = headers).json()
itemid = []
shopid = []
for item in r['items']:
#print(item['itemid'], ' ', item['shopid'] ' ', item['price']/100000)
itemid.append(item['itemid'])
shopid.append(item['shopid'])
print('item_id, price, sold, rating')
for i in range(len(itemid)):
url = ('https://shopee.co.id/api/v2/item/get?itemid={}&shopid={}').format(itemid[i], shopid[i])
p = requests.get(url, headers = headers).json()
print(p['item']['itemid'], ' ', p['item']['price']/100000, ' ', p['item']['historical_sold'], p['item']['item_rating']['rating_star'])