Question

我一直在关注一些Python教程，需要一些帮助。在下面的代码htmlfile = urllib.urlopen(urls[i])中，我无法理解[i]之后需要urls的原因。

import urllib

urls = ["http://google.com","http://nytimes.com","http://cnn.com"]
i=0

while i< len(urls):
    htmlfile = urllib.urlopen(urls[i])
    htmltext = htmlfile.read()
    print htmltext
    i+=1

Answer 1

i正在为列表urls编制索引，允许您逐个返回项目。见下文：

>>> urls = ["http://google.com","http://nytimes.com","http://cnn.com"]
>>> i = 0
>>> while i < len(urls):
...     print i, urls[i]
...     i += 1
...
0 http://google.com
1 http://nytimes.com
2 http://cnn.com
>>>

此外，我想提一下，您的代码可以重构为更高效：

import urllib
urls = ["http://google.com","http://nytimes.com","http://cnn.com"]
for url in urls:
    print urllib.urlopen(url).read()

这个新代码与旧代码完全相同。

Answer 2

urls是一个字符串列表。 [i]引用该列表中的i元素，因此您一次只能访问一个网站。

值得注意的是，这是不一种好的，Pythonic迭代列表的方法。你的循环会更好更清晰：

for url in urls:
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()
    print htmltext

另外值得考虑的是：一旦你习惯了代码本身，就可以一次性完成该循环中的所有操作，而无需分配所有这些额外的变量。

for url in urls:
    print urllib.urlopen(url).read()

Answer 3

url这是一个列表。 [i]在该列表的一个项目之间进行选择。

例如，如果：

>>> urls = ["http://google.com","http://nytimes.com","http://cnn.com"]

然后：

>>> urls[0]
"http://google.com"
>>> urls[1]
"http://nytimes.com"

等等。

但是，在你的情况下，我会使用for循环而不是一段时间，因此你不需要在之前声明循环变量。像这样：

import urllib

urls = ["http://google.com","http://nytimes.com","http://cnn.com"]


for i in  range(len(urls)):
    htmlfile = urllib.urlopen(urls[i])
    htmltext = htmlfile.read()
    print htmltext

Answer 4

这应该被重写。你有一个列表，而不是一个元组，所以集合中项目的位置没有意义。

import urllib

urls = ["http://google.com","http://nytimes.com","http://cnn.com"]

for url in urls:
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()
    print htmltext

如果迭代所有项目，在Python中使用计数器也不是很惯用。仅在需要自定义排序时使用它，然后再使用itertools包。

Answer 5

urls是一个列表，因此有一个索引。为了访问列表中的值，必须通过其索引执行此操作。让我演示一下：

>>> urls = ['hello', 'world']
>>> urls[0]
'hello'
>>> urls[1]
'world'
>>> len(urls)
2
>>>

请注意，索引是基于0的（意味着第一个元素是通过0访问的，然后1是第二个元素）。这就是while语句中的条件读取while i < len(url)的原因，因为i正在访问索引，并且因为索引从0开始而不是1，你只能继续它直到1这是列表中的第二个值。

让我演示如果您通过将2放在索引值中超出范围会发生什么：

>>> urls[2]

Traceback (most recent call last):
  File "<pyshell#7>", line 1, in <module>
    urls[2]
IndexError: list index out of range
>>>

如您所见，您获得IndexError。

但是，在您的情况下，使用list循环有更好的方法来遍历for网址：

# This look will go through all the values inside your list, and the current value will be called url
for url in urls:  # Here url is the value inside the list
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()
    print htmltext

使用for循环进行演示：

>>> for url in urls:
    print url


hello
world
>>>

我可能还建议您使用python-requests，它非常适合通过GET和POST等常见HTTP协议发送请求。它会为你节省很多hassle in the future。

Answer 6

url是一个列表，因此需要url[i]来索引列表中的项目。如果没有索引，您将尝试打开网址列表而不是单个网址。

while循环从i=0开始，迭代到i < len(urls)，urls中的每个项目都是{{1}}。

变量名之后[i]做了什么？

6 个答案: