在我使用urllib2
打开的网页上,并使用BeautifulSoup
进行抓取,我正在尝试在网页中存储特定文字。
在您看到代码之前,此处链接到网页上的HTML屏幕截图,以便您了解我使用find
中的BeautifulSoup
功能的方式:
最后,这是我正在使用的代码:
from BeautifulSoup import BeautifulSoup
import urllib2
url = 'http://www.sciencekids.co.nz/sciencefacts/animals/bird.html'
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
ul = soup.find('ul', {'class': 'style33'})
children = ul.findChildren()
for child in children:
print child.text
这是我的问题所在的输出:
Birds have feathers, wings, lay eggs and are warm blooded.
Birds have feathers, wings, lay eggs and are warm blooded.
There are around 10000 different species of birds worldwide.
There are around 10000 different species of birds worldwide.
The Ostrich is the largest bird in the world. It also lays the largest eggs and has the fastest maximum running speed (97 kph).
The Ostrich is the largest bird in the world. It also lays the largest eggs and has the fastest maximum running speed (97 kph).
Scientists believe that birds evolved from theropod dinosaurs.
Scientists believe that birds evolved from theropod dinosaurs.
Birds have hollow bones which help them fly.
Birds have hollow bones which help them fly.
Some bird species are intelligent enough to create and use tools.
Some bird species are intelligent enough to create and use tools.
The chicken is the most common species of bird found in the world.
The chicken is the most common species of bird found in the world.
Kiwis are endangered, flightless birds that live in New Zealand. They lay the largest eggs relative to their body size of any bird in the world.
Kiwis are endangered, flightless birds that live in New Zealand. They lay the largest eggs relative to their body size of any bird in the world.
Hummingbirds can fly backwards.
Hummingbirds can fly backwards.
The Bee Hummingbird is the smallest living bird in the world, with a length of just 5 cm (2 in).
The Bee Hummingbird is the smallest living bird in the world, with a length of just 5 cm (2 in).
Around 20% of bird species migrate long distances every year.
Around 20% of bird species migrate long distances every year.
Homing pigeons are bred to find their way home from long distances away and have been used for thousands of years to carry messages.
Homing pigeons are bred to find their way home from long distances away and have been used for thousands of years to carry messages.
我的代码中是否存在错误使用和/或不正确的内容,这使得有两个孩子应该只有一个?创建一些额外的代码很容易,因此我不会存储相同信息的副本,但我宁愿以正确的方式执行此操作,以便我只获取我正在寻找的每个字符串中的一个。
答案 0 :(得分:2)
children = ul.findChildren()
正在选择<li>
中的<p>
和<ul>
。迭代children
会导致您打印这两个元素的text
属性。要解决此问题,只需将children = ul.findChildren()
更改为children = ul.findChildren("p")
。