我正在尝试创建一个python脚本来抓取网站上的一系列子页面,然后将数据放到一个文件中。不知道如何将变量放入url然后遍历列表。这是我到目前为止......
import httplib2
h = httplib2.Http('.cache')
s = ['one', 'two', 'three']
def getinfo():
response, content = h.request('https-www.example.com/<list items>/info', headers={'Connection':'keep-alive'})
print(content)
print(response)
for q in range(len(s)):
getinfo()
答案 0 :(得分:2)
import httplib2
h = httplib2.Http('.cache')
s = ['one', 'two', 'three']
def getinfo(subpage):
response, content = h.request(
'https-www.example.com/{}/info'.format(subpage),
headers={'Connection': 'keep-alive'}
)
print(content)
print(response)
for subpage in s:
getinfo(subpage)
答案 1 :(得分:0)
试试这个,
def getinfo(item):
response, content = h.request('https-www.example.com/'+ str(item) + '/info', headers={'Connection':'keep-alive'})
print(content)
print(response)
for q in s:
getinfo(q)
答案 2 :(得分:0)
可能你需要像
这样的东西import httplib2
h = httplib2.Http('.cache')
s = ['one', 'two', 'three']
def getinfo():
for elem in s:
response, content = h.request('https-www.example.com/'+elem+'/info', headers={'Connection':'keep-alive'})
print(content)
print(response)
答案 3 :(得分:0)
另一个选项是%格式:
def getinfo():
response, content = h.request('https-www.example.com/%s/info' % subpage, headers={'Connection':'keep-alive'})
print(content)
print(response)