Question

我正在尝试创建一个python脚本来抓取网站上的一系列子页面，然后将数据放到一个文件中。不知道如何将变量放入url然后遍历列表。这是我到目前为止......

import httplib2
h = httplib2.Http('.cache')
s = ['one', 'two', 'three']


def getinfo():
    response, content = h.request('https-www.example.com/<list items>/info', headers={'Connection':'keep-alive'})
    print(content)
    print(response)

for q in range(len(s)):
    getinfo()

Answer 1

使用str.format

import httplib2
h = httplib2.Http('.cache')
s = ['one', 'two', 'three']


def getinfo(subpage):
    response, content = h.request(
        'https-www.example.com/{}/info'.format(subpage), 
        headers={'Connection': 'keep-alive'}
    )
    print(content)
    print(response)

for subpage in s:
    getinfo(subpage)

Answer 2

试试这个，

def getinfo(item):
    response, content = h.request('https-www.example.com/'+ str(item) + '/info', headers={'Connection':'keep-alive'})
    print(content)
    print(response)

for q in s:
    getinfo(q)

Answer 3

可能你需要像

这样的东西

import httplib2
h = httplib2.Http('.cache')
s = ['one', 'two', 'three']

def getinfo():
    for elem in s:
        response, content = h.request('https-www.example.com/'+elem+'/info', headers={'Connection':'keep-alive'})
        print(content)
        print(response)

Answer 4

另一个选项是％格式：

def getinfo():
    response, content = h.request('https-www.example.com/%s/info' % subpage, headers={'Connection':'keep-alive'})
    print(content)
    print(response)

Python URL和目标列表

4 个答案: