刮刮DIV然后抓住所有<a> in the siblings of that DIV

时间:2017-10-16 02:25:00

标签: python web-scraping beautifulsoup

What I'm trying to do is take the current date and store into a variable, then use that variable to find the date in a DIV. Once it locates that DIV, I want it to grab all the <a ref> links within the sub (siblings) DIVs.

import re, time
from urllib2 import urlopen as uReq
import datetime as dt

from bs4 import BeautifulSoup as soup
my_url = 'www.domainname.com'

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

date = dt.datetime.today().strftime("%m-%d-%Y")
page_soup = soup(page_html, "lxml") 

### I feel like I'm missing something here!
### Need to add variable (date) to find DIV (i.e. 2017-10-21)
### Add H REF links from all sub DIVs within the variable (date). which I believe would use the code below?

links = page_soup.findAll('div', attrs={'class' : 'gameLinks'})
for div in links:
    link = div.find('a')['href']
    if "ufc" in link:  
        print """<a href="{link}">{link}</a><br>""".format(link=link)  

Any ideas?

1 个答案:

答案 0 :(得分:0)

更改每个模块的名称会使您的代码难以阅读,但除此之外,这是在div(或任何地方)内查找链接的典型约定:

<textarea id="result"></textarea>