Question

我正在尝试将现有网址作为参数传递，以便在单个txt文件中加载HTML：

for line in open('C:\Users\me\Desktop\URLS-HERE.txt'):
 if line.startswith('http') and line.endswith('html\n') :
    fichier = open("C:\Users\me\Desktop\other.txt", "a")
    allhtml = urllib.urlopen(line)
    fichier.write(allhtml)
    fichier.close()

但是我收到以下错误：

TypeError: expected a character buffer object

Answer 1

urllib.urlopen（）返回的值是一个类似于object的文件，一旦打开它，就应该使用read（）方法读取它，如下面的代码片段所示：

for line in open('C:\Users\me\Desktop\URLS-HERE.txt'):
   if line.startswith('http') and line.endswith('html\n') :
      fichier = open("C:\Users\me\Desktop\other.txt", "a")
      allhtml = urllib.urlopen(line)
      fichier.write(allhtml.read())
      fichier.close()

希望这有帮助！

Answer 2

这里的问题是urlopen返回对应从中检索HTML的文件对象的引用。

for line in open(r"C:\Users\me\Desktop\URLS-HERE.txt"):
 if line.startswith('http') and line.endswith('html\n') :
    fichier = open(r"C:\Users\me\Desktop\other.txt", "a")
    allhtml = urllib2.urlopen(line)
    fichier.write(allhtml.read())
    fichier.close()

请注意，urllib.urlopen函数自python 2.6起被标记为已弃用。建议改为使用urllib2.urlopen。

此外，您必须小心处理代码中的路径。你应该逃避每个\

"C:\\Users\\me\\Desktop\\other.txt"

或在字符串前使用r前缀。当存在'r'或'R'前缀时，字符串中包含反斜杠后面的字符而不做更改。

r"C:\Users\me\Desktop\other.txt"

Python urlopen返回值

2 个答案: