Question

我之前发布了this问题请求python脚本的帮助，并没有得到太多反馈，这没关系！因为我自己想出了如何处理大部分工作，但我遇到了一些麻烦。

我的脚本目前是这样的：

param1 = 
param2 = 
param3 = 

requestURL = "http://examplewebpage.com/live2/?target=param1&query=param2&other=param3"

html_content = urllib2.urlopen(requestURL).read()

matches = re.findall('<URL>(.*?)</URL>', html_content);

myList=[matches]

i = 0
while i < len(myList):
    testfile = urllib.URLopener()
    testfile.retrieve(myList[i], "/Users/example/file/location/newtest")
    i += 1

这成功检索了网页中的所有网址，但我找不到继续下载过程的方法。我目前收到以下错误：'list'对象没有属性'strip'

有人能想到更好的方法吗？或者除了列表之外我是否应该使用不同的数据类型？

Answer 1

我认为主要问题是myList=[matches]创建了一个新列表，其中只包含一个元素。该单个元素本身就是一个匹配列表。

因此，当您稍后在循环中访问myList[0]时，它实际上是一个列表。因此错误。

假设你的其余代码是正确的，我认为如果你只是切换到myList=matches，事情可能会有效，但这里有一个使用更清晰的变量名和for循环的版本：

requestURL = "http://examplewebpage.com/live2/?target=param1&query=param2&other=param3"

html_content = urllib2.urlopen(requestURL).read()

matches = re.findall('<URL>(.*?)</URL>', html_content);

for url in matches:
    testfile = urllib.URLopener()
    testfile.retrieve(url, "/Users/example/file/location/newtest")

修改

当然，每个页面都会写入同一个文件，除非URLopener.retrieve执行自动重命名文件之类的操作？

通过列表迭代下载python

1 个答案: