获取网址的源代码

时间:2013-07-07 17:37:56

标签: python

我有以下代码:

import urllib2
from itertools import product

with open('urllist.txt') as urllist:
    urls=[line.strip() for line in urllist]

for url in product(urls):
    usock = urllib2.urlopen(url)
    data = usock.read()
    usock.close()
    sourcecode=open('./sourcecode', 'w+')
    sourcecode.write(data)

当我跑它时,它给了:

Traceback (most recent call last):
  File "12.py", line 8, in <module>
    usock = urllib2.urlopen(url)
  File "/opt/python2.7.1/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/opt/python2.7.1/lib/python2.7/urllib2.py", line 383, in open
    req.timeout = timeout
AttributeError: 'tuple' object has no attribute 'timeout'

知道怎么解决吗?非常感谢!

1 个答案:

答案 0 :(得分:3)

itertools.product返回一个元组而不是项目本身。:

>>> from itertools import product
>>> lis = ['a','b','c']
>>> for p in product(lis):
...     print p
...     
('a',)
('b',)
('c',)

使用简单的循环覆盖网址:

for url in urls:
    usock = urllib2.urlopen(url)