Question

我试图找出如何只读取网站中每个网址的网址，每次运行代码时我都会收到错误：

AttributeError：module＆＃39; urllib＆＃39;没有属性＆＃39; urlopen＆＃39;

我的代码在

下面

import os
import subprocess
import urllib

datasource = urllib.urlopen("www.google.com")

while 1:
        line = datasource.readline()
        if line == "": break
        if (line.find("www") > -1) :
                print (line)


li = ['www.apple.com', 'www.google.com']
os.chdir('..')
os.chdir('..')
os.chdir('..')
os.chdir('Program Files (x86)\\LinkChecker')

for s in li:
    os.system('Start .\linkchecker ' + s)

Answer 1

似乎是python3X，所以你应该使用

urllib.request.urlopen

Answer 2

这是一个非常简单的例子。

这适用于Python 3.2及更高版本。

import urllib.request
with urllib.request.urlopen("http://www.apple.com") as url:
    r = url.read()
print(r)

供参考，请仔细阅读此问题。 Urlopen attribute error

Answer 3

AttributeError 是因为它应该是urllib.request.urlopen而不是urllib.urlopen。

除了问题中提到的AttributeError之外，我还遇到了2个错误。

ValueError ：未知网址类型：＆＃39; www.google.com＆＃39;

解决方案：重写定义datasource的行，如下所示https部分：

datasource = urllib.request.urlopen("https://www.google.com")
TypeError ：需要类似字节的对象，而不是＆＃39; str＆＃39; 在线＃39; if（line.find（＆＃34; www＆＃34;）＆gt; -1）：`。

整体解决方案代码为：

import os
import urllib

datasource = urllib.request.urlopen("https://www.google.com")

while 1:
        line = str(datasource.read())
        if line == "": break
        if (line.find("www") > -1) :
                print (line)

li = ['www.apple.com', 'www.google.com']
os.chdir('..')
os.chdir('..')
os.chdir('..')
os.chdir('Program Files (x86)\\LinkChecker')

for s in li:
    os.system('Start .\linkchecker ' + s)

如何在python中读取url，然后在网站上打印每个URL？

3 个答案: