Question

Trying to get all tags in a website, I have this piece of code:

results=[]

all_links = soup.find_all('article')
        for link in all_links:
            print link.find('div', class_="cb-category cb-byline-element")

This way I get scraped data displayed in the following manner (with ',', separating <a> tags):

<div class="cb-category cb-byline-element"><i class="fa fa-folder-o"></i> <a href="http://ridethetempo.com/category/canadian/" title="View all posts in Canadian">Canadian</a>,  <a href="http://ridethetempo.com/category/music/garage-rock/" title="View all posts in Garage">Garage</a>,  <a href="http://ridethetempo.com/category/listen-2/" title="View all posts in Listen">Listen</a>,  <a href="http://ridethetempo.com/category/music/" title="View all posts in Music">Music</a>,  <a href="http://ridethetempo.com/category/music/psychedelic/" title="View all posts in Psychedelic">Psychedelic</a>,  <a href="http://ridethetempo.com/category/under-2000/" title="View all posts in Under 2000">Under 2000</a></div>

however, if I do the following:

 results.append(link.find('div', class_="cb-category cb-byline-element"))
 for link in results:
     link.find('a', href=True)['href']

I get only the first <a> for each block of <div>, like so:

http://ridethetempo.com/category/canadian/

How do recursively retrieve all <a> tags, so I end up with this result?

http://ridethetempo.com/category/canadian/ 
http://ridethetempo.com/category/music/garage-rock/
http://ridethetempo.com/category/listen-2/
http://ridethetempo.com/category/music/ 
http://ridethetempo.com/category/music/psychedelic/
http://ridethetempo.com/category/under-2000/

Answer 1

for link in soup.find_all('a'):
    print(link.get('href'))

将打印所有'a'标记元素

BeautifulSoup - 全部<a> tags separated by commas

1 个答案: