Question

I want to scrape the data of websitses using Beautiful Soup and requests, and I've come so far that I've got the data I want but now I want to filter it:

from bs4 import BeautifulSoup
import requests
url = "website.com"
keyword = "22222"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data, 'lxml')

for article in soup.find_all('a'):
    for a in article:
        if article.has_attr('data-variant-code'):
            print(article.get("data-variant-code"))

Let's say this prints the following: 11111 22222 33333

How can I filter this so it only returns me the "22222"?

Answer 1

if you want to print the 2nd group of chars in a string delimited by space, then you can split the string using space as delimiter. This will give you a list of strings then access the 2nd item of the list.

For example:

print(article.get("data-variant-code").split(" ")[1])

result:  22222

Answer 2

assuming that article.get("data-variant-code") prints 11111, 22222, 33333, you can simply use an if statement:

for article in soup.find_all('a'):
    for a in article:
        if article.has_attr('data-variant-code'):
           x = article.get("data-variant-code")
           if x == '22222':
               print(x)

Python - Beautiful Soup - How to filter the extracted data for keywords?

2 个答案: