为什么我的网络抓取代码未按应有的方式提取数据?

时间:2019-11-01 02:32:33

标签: python pandas selenium web-scraping

我正在尝试从在线购物网站获取数据。我的代码运行没有任何错误,但是数据没有像应有的那样提取到csv文件中。代码哪里出问题了?

{{1}}

我希望代码可以返回网站上可用产品的名称,价格和等级等数据。

1 个答案:

答案 0 :(得分:0)

flipkart:当浏览器在网页中执行javascript时,将从script标签动态加载。您可以正则表达式输出此信息,并使用json解析器进行解析,以仅使用requests即可检索所需的信息;没有硒的开销。

import requests, re, json

p = re.compile(r'window\.__INITIAL_STATE__ = (.*);')
r = requests.get('https://www.flipkart.com/lenovo-core-i3-6th-gen-4-gb-1-tb-hdd-windows-10-home-ip-320e-laptop/p/itmf3s32ghxrkrhf?pid=COMEWM7FTAQ9EHRF&srno=b_1_2&otracker=browse&lid=LSTCOMEWM7FTAQ9EHRFBL70ZV&fm=organic&iid=90098c10-e53b-49dc-9359-ff04338c0c4e.COMEWM7FTAQ9EHRF.SEARCH&ssid=2d6xzladk00000001572540087124')
data = json.loads(p.findall(r.text)[0])['pageDataV4']['page']['data']['10002'][1]['widget']['data']

##data sections:
# data.keys()

##pricing info:
# data['pricing']['value'].keys()
# data['pricing']['value']['mrp'].keys()

##rating info:
# data['ratingsAndReviews']['value']['rating']

price = data['pricing']['value']['mrp']['currency'] + str(data['pricing']['value']['mrp']['value'])
title = ' '.join(reversed([v for k,v in data['titleComponent']['value'].items() if k in ['title', 'subtitle']]))
average_rating = data['ratingsAndReviews']['value']['rating']['average']