Question

首先让我解释一下来源：我正在编写一个简单的python脚本，可以在网站的所有页面中搜索并收集带有文本的特殊html标记。我的代码：

lineline = urllib.request.urlopen("http://www.test-site.com")
lineliner = lineline.read()
allsoupurl = beautifulsoup(lineliner, "html.parser")
allhtmllisturl = allsoupurl.find_all("h1", class_= "title")

print (allhtmllisturl)

好的，这段代码工作得非常好，并显示所有可用的h1标签和类标题。结果是：

[<h1 class="title>title-1</h1>"]
[<h1 class="title>title-2</h1>"]
[<h1 class="title>title-3</h1>"]
[<h1 class="title>title-4</h1>"]

但是当我更改这样的代码时：

lineline = urllib.request.urlopen("http://www.test-site.com")
lineliner = lineline.read()
allsoupurl = beautifulsoup(lineliner, "html.parser")
allhtmllisturl = allsoupurl.find_all("h1", class_= "title")

for h1 in allhtmllisturl:
    print (h1.get_text())

脚本的结果只显示第一个可用（h1）标记然后脚本结束，并且不显示所有可用标记。结果是：

title-1

问题是什么？

感谢

Answer 1

在find_all（）元素中有一些id必须在attrs = {}（属性）

内

lineline = urllib.request.urlopen("http://www.test-site.com")
lineliner = lineline.read()
allsoupurl = beautifulsoup(lineliner, "html.parser")
allhtmllisturl = allsoupurl.find_all("h1", attrs={'class'= "title"})

for h1 in allhtmllisturl:
    print (h1.get_text())

停止python脚本的进程？

1 个答案: