使用Beautiful Soup中的css选择器获取正确的节点?

时间:2017-03-17 09:37:47

标签: python web-scraping beautifulsoup

对于链接,

  

http://www.jabong.com/Adidas-Base-Mid-Dd-Blue-Round-Neck-T-Shirt-2733238.html

...我需要获得产品结构细节,“Polyster”。但我得“Fabric”作为输出。以下是代码的一部分。

soup.find_all("span", {"class":"product-info-left"})[0].text

3 个答案:

答案 0 :(得分:1)

找到您的节点next_sibling

soup.find_all("span", {"class":"product-info-left"})[0].next_sibling.text

答案 1 :(得分:0)

您可以在此处使用.next.next_sibling

>>> soup.find_all("span", {"class":"product-info-left"})[0].next.next.text
'Polyester'
>>> soup.find_all("span", {"class":"product-info-left"})[0].next_sibling.text
'Polyester'

答案 2 :(得分:0)

您需要的信息位于ul标记下的ul标记中,您应首先找到li,然后您可以获取{{1}中的所有文字使用stripped_strings

标记
In [47]: r = requests.get('http://www.jabong.com/Adidas-Base-Mid-Dd-Blue-Round-Neck-T-Shirt-2733238.html')

In [48]: soup = BeautifulSoup(r.text, 'lxml')

In [49]: ul = soup.find('ul', class_="prod-main-wrapper")

In [50]: for li in ul.find_all('li'):
    ...:     print(list(li.stripped_strings))
    ...:     
['Fabric', 'Polyester']
['Sleeves', 'Half Sleeves']
['Neck', 'Round Neck']
['Fit', 'Regular']
['Color', 'Blue']
['Style', 'Solid']
['SKU', 'AD004MA61NGOINDFAS']
['Model Stats', 'This model has height 6\'0",Chest 38",Waist 34"and is Wearing Size M.']
['Authorization', 'Adidas authorized online sales partner.', 'View Certificate']

如果您只想要第一行,则可以使用find(),它会返回find_all()中的fists元素:

In [51]: text = ul.find('li').stripped_strings  
In [52]: print(list(text))
['Fabric', 'Polyester']