如何获取特定的HTML标记

时间:2016-10-16 08:50:26

标签: python parsing beautifulsoup

如果商品没有文字,我试图获取HTML标签 例如:我正在遍历所有" a"属性(URL)。
但是,有些网址中包含文字,而有些则不包含文字 在这种情况下,我试图获取那些没有文字的网址。
因此,我做了类似的事情。

response = requests.get('https://fw.tmall.com/tmall/ser/tmall_detail.htm?spm=a1z1g.2177293.0.0.qF9gPO&service_code=ts-4078').text
soup = BeautifulSoup(response)
main_wrapper = soup.find('div',attrs={'id':'success-case'}).findAll('a')
for items in main_wrapper:
    dictionary = {}
    href = items['href']
    if items.string is None:
        print items['href']
    else:
        print items.string

我如何才能if items.string is None:只获取该项目特定的网址,而不是所有网址?

2 个答案:

答案 0 :(得分:0)

  

我试图获取那些没有文字的网址

您可以使用list-comprehension

hrefs = [a['href'] for a in main_wrapper if a.string is None]
  

仅获取该项目特定网址,而不是所有网址!

不清楚这意味着什么。每个a代码只有一个特定的网址。您正在迭代a标记列表,因此您会获得一个网址列表

  

我想获取特定的HTML属性,在这种情况下,它将是<a>内的IMG网址

然后,您需要在循环中使用另一个find方法来提取<img>元素以获取src属性

答案 1 :(得分:0)

我认为您正试图从 div 中的无序列表中获取唯一的锚点。您可以看到每个锚点都有一个唯一的类rel-ink vs rel-name

 <a href="//store.taobao.com/shop/view_shop.htm?user_number_id=2469022358" target="_blank" class="rel-ink"><img alt="NIHAOMARKET官方海外旗舰店" src="//img.alicdn.com/top/i1/TB1urimJFXXXXabaXXXwu0bFXXX.png" class="rel-img"></a>
 <a href="//store.taobao.com/shop/view_shop.htm?user_number_id=2469022358" target="_blank" class="rel-name">NIHAOMARKET官方海外旗舰店</a>

因此,你可以使用每个 li 中的第一个锚的锚类名,即 rel-ink 来获取它们:

urls =[a["href"] for a in soup.find('div', id="success-case").find_all("a",class_="rel-ink")]

或使用 css选择器

urls = [a["href"] for a in soup.select("#success-case ul li a.rel-ink")]

两者都会给你:

['//store.taobao.com/shop/view_shop.htm?user_number_id=692020965', '//store.taobao.com/shop/view_shop.htm?user_number_id=2087799889', '//store.taobao.com/shop/view_shop.htm?user_number_id=2469022358', '//store.taobao.com/shop/view_shop.htm?user_number_id=377676745', '//store.taobao.com/shop/view_shop.htm?user_number_id=2367059695', '//store.taobao.com/shop/view_shop.htm?user_number_id=449764134', '//store.taobao.com/shop/view_shop.htm?user_number_id=698389964', '//store.taobao.com/shop/view_shop.htm?user_number_id=509711360', '//store.taobao.com/shop/view_shop.htm?user_number_id=692020965', '//store.taobao.com/shop/view_shop.htm?user_number_id=1125022434', '//store.taobao.com/shop/view_shop.htm?user_number_id=1071997040', '//store.taobao.com/shop/view_shop.htm?user_number_id=795947607', '//store.taobao.com/shop/view_shop.htm?user_number_id=509711360', '//store.taobao.com/shop/view_shop.htm?user_number_id=692020965', '//store.taobao.com/shop/view_shop.htm?user_number_id=1071997040', '//store.taobao.com/shop/view_shop.htm?user_number_id=509711360', '//store.taobao.com/shop/view_shop.htm?user_number_id=377676745', '//store.taobao.com/shop/view_shop.htm?user_number_id=2367059695', '//store.taobao.com/shop/view_shop.htm?user_number_id=2469022358']