我需要Python循环技术方面的帮助。 经过几天的搜索,我放弃了...
系统:Windows(Anaconda)
想法:“我创建了HTML解析器脚本,但是由于缺少Python脚本的知识和经验,看来我需要在每个页面上运行它。我无法修复它,这就是为什么我决定对此进行循环脚本并使其在100页中运行100次”。...但结果是,我找不到正确的方法...
我的脚本
import requests
import pandas as pd
import urllib.parse
import urllib.request
import re
import os
import sys
url = "*******************/store/index.php"
querystring ={"id":"***","act":"search","***":"***","country":"",
"state":"*","city":"","zip":"","type":"","base":"","PAGENUM":"2"}
headers = {
'Host': "www.*****",
'Connection': "keep-alive",
'Upgrade-Insecure-Requests': "1",
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36",'Accept':"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
'Referer': "h************/store/index.php?id=********************&pagenum=2",
'Accept-Encoding': "gzip, deflate",
'Accept-Language': "en-US,en;q=0.9",
'Cookie': "php_session_id_real=**********; cookname=**********; cook******",
'cache-control': "no-cache",
'Postman-Token': "**************************"
}
response = requests.request("GET", url, headers=headers,params=querystring)
df_list = pd.read_html(response.text)
df = df_list[-1]
print(df)
我需要更改的是PAGENUM查询字符串(例如:&pagenum = 2、3、10、50 等)
是否可以运行此python脚本 X 次, 并每次更改 pagenum = pagenum 的值+ 1 ??
希望您的建议!
欢呼
答案 0 :(得分:1)
使用for
并迭代到包含所有所需值的列表。
接下来,使用str
将值存储在字典中。
执行以下操作:
import requests
import pandas as pd
import urllib.parse
import urllib.request
import re
import os
import sys
pagenums=[2,3,10,50]
#or pagenums = np.range(1,101)
for page in pagenums:
querystring ={"id":"***","act":"search","***":"***","country":"",
"state":"*","city":"","zip":"","type":"","base":"","PAGENUM":str(page)}
#......
#..... # more code here
#headers = {....}
对于每次迭代,PAGENUM
键的值都会更新。
答案 1 :(得分:0)
您需要申请运行100次并占用您所有页面的循环。我希望下面的代码能正常工作。
import requests
import pandas as pd
import urllib.parse
import urllib.request
import re
import os
import sys
import numpy as np
url = "*******************/store/index.php"
pagenums = np.arange(0,100)
for i in pagenums:
querystring ={"id":"***","act":"search","***":"***","country":"",
"state":"*","city":"","zip":"","type":"","base":"","PAGENUM":str(i)}
headers = {
'Host': "www.*****",
'Connection': "keep-alive",
'Upgrade-Insecure-Requests': "1",
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36",'Accept':"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
'Referer': "h************/store/index.php?id=********************&pagenum=2",
'Accept-Encoding': "gzip, deflate",
'Accept-Language': "en-US,en;q=0.9",
'Cookie': "php_session_id_real=**********; cookname=**********; cook******",
'cache-control': "no-cache",
'Postman-Token': "**************************"
}
response = requests.request("GET", url, headers=headers,params=querystring)
df_list = pd.read_html(response.text)
df = df_list[-1]
enter code here
print(df)