Question

我有4个网址..现在我需要网址的袖子细节。袖子细节改变位置，因此存储它的节点也会改变...对于第一个网址，袖子位于第2位置其他三个网址是第3位......我需要输出如下......

URLS                                                                                                        Sleeves
http://www.jabong.com/belle-fille-Green-Solid-Winter-Jacket-1310755.html?pos=5&cid=BE797WA44OZRINDFAS   Full Sleeves
http://www.jabong.com/oxolloxo-Off-White-Solid-Reversible-Blazer-2687327.html?pos=8&cid=OX344WA72XITINDFAS  Long Sleeve
http://www.jabong.com/oxolloxo-Multicoloured-Checked-Blazer-2784283.html?pos=16&cid=OX344WA16KTVINDFAS  3/4th Sleeves
http://www.jabong.com/mirika-Blue-Embellished-WINTER-JACKET-2754538.html?pos=19&cid=MI137WA61STUINDFAS  Sleeveless

以下是我的代码部分：

for 1st url : soup.find_all("span", {"class":"product-info-left"})[1].next_sibling.text

for 2nd to 4th url : soup.find_all("span", {"class":"product-info-left"})[2].next_sibling.text

Answer 1

soup.find("span", text="Sleeves").next_sibling.text

Answer 2

您只能找到包含'Sleeve'的字符串。

def check(text):
    return type(text) != type(None) and text.find('Sleeve') > -1

sleeves = soup.find_all(string=check)
print(sleeves[1])

输出

Full Sleeves
Long Sleeve
3/4th Sleeves
Sleeveless

要使用功能学习过滤，请检查此link。

需要使用美丽的汤来获取特定节点吗？

2 个答案: