当用python抓取时循环

时间:2019-03-01 10:16:38

标签: python json web-scraping

我正在尝试使用python通过此简单代码来抓取数据

import requests
import json

url = "https://xxxxxxxx.com/getNamesEnc02Motasel2.php?keyword=fais&type=2&limit=100"
r = requests.get(url)
cont = json.loads(r.content)
print(cont)

代码输出:JSON

[{u'phone': u'99399934', u'name': u'fai'}, {u'phone': u'99111267', u'name': u'Fai2 Basheer '}, {u'phone': u'50129494', u'name': u'Fai4 Delly '}]

对我来说很好,但问题是需要循环,这样我可以为示例发送多个具有不同参数的请求:

https://xxxxxxxx.com/getNamesEnc02Motasel2.php?keyword=JOHN&type=2&limit=6000"
https://xxxxxxxx.com/getNamesEnc02Motasel2.php?keyword=SAM&type=2&limit=9000"
https://xxxxxxxx.com/getNamesEnc02Motasel2.php?keyword=JOHN&type=2&limit=1000"
https://xxxxxxxx.com/getNamesEnc02Motasel2.php?keyword=HARRY&type=2&limit=7000"

因为每个不同的限制参数值都会从同一关键字中删除新数据,因为json请求仅返回1000Row。

2 个答案:

答案 0 :(得分:0)

第一种方法

使用list个网址,然后对其进行迭代以获取每个网址的响应。

urls = ['https://xxxxxxxx.com/getNamesEnc02Motasel2.php?keyword=JOHN&type=2&limit=6000"',
'https://xxxxxxxx.com/getNamesEnc02Motasel2.php?keyword=SAM&type=2&limit=9000"',
'https://xxxxxxxx.com/getNamesEnc02Motasel2.php?keyword=JOHN&type=2&limit=1000"',
'https://xxxxxxxx.com/getNamesEnc02Motasel2.php?keyword=HARRY&type=2&limit=7000"']

for url in urls:
    r = requests.get(url)
    cont = json.loads(r.content)
    print(cont)

第二种方法

包含所有参数的嵌套dict

data = {
'data_1' : {'keyword': 'JOHN', 'type': '2', 'limit': '6000'},
'data_2' : {'keyword': 'SAM', 'type': '2', 'limit': '2000'},
'data_3' : {'keyword': 'JOHN', 'type': '2', 'limit': '1000'},
'data_4' : {'keyword': 'HARRY', 'type': '2', 'limit': '7000'}
}

for param in data:
    page = requests.get("https://xxxxxxxx.com/getNamesEnc02Motasel2.php?", params=data[param])
    cont = json.loads(r.content)
    print(cont)

答案 1 :(得分:0)

parameters = [
'JOHN:6000',
'SAM:9000',
'JOHN:1000',
'HARRY:7000']



import requests
import json


for item in parameters:
    key, value = item.split(':')
    url = "https://xxxxxxxx.com/getNamesEnc02Motasel2.php?keyword=%s&type=2&limit=%s" %(key, value)
    r = requests.get(url)
    cont = json.loads(r.content)
    print(cont)