How to scrape four different classes into one single list

时间:2018-08-22 13:46:56

标签: python list for-loop beautifulsoup

So basically I am kinda trying to make a monitor through a website [Supreme restock][1]

So now the problem is that I might be totally vanished but the only solution I found out was to create something like:

while True:
    try:
        list = []
        list2 = []
        list3 = []
        list4 = []

        url = 'https://www.supremecommunity.com/restocks/eu/'
        bs4 = soup(requests.get(url).text, "html.parser")

        for tag in bs4.findAll('h5', {'class': 'handle restock-name'}):
            list.append(tag.string)

        for tag2 in bs4.findAll('h6', {'class': 'restock-colorway'}):
            list2.append(tag2.string)

        for tag2 in bs4.findAll('h6', {'class': 'restock-colorway'}):
            list2.append(tag2.string)

        for tag3 in bs4.findAll('img', {'class': 'l2d-image size-thumbnail'}):
            list3.append(tag3['data-src'])

        for tag4 in bs4.findAll('div', {'class': 'message-item restock-item'}):
            itemid = tag4['data-itemid']
            list4.append('http://www.supremenewyork.com/shop/' + itemid)


        y = 0
        for x in list[:]:
            print(x + list2[y] + ' - ' + list3[y] + ' - ' + list4[y])
            y += 1

        sys.exit()

and it does print out what I want:

Cutouts Tee( Terra Cotta - XLarge ) - http://assets.supremenewyork.com/156668/sm/laJkUkh_sRA.jpg - http://www.supremenewyork.com/shop/303505
Nylon Plaid Pullover( Green - XLarge ) - http://assets.supremenewyork.com/156221/sm/Gcd63F5PQKk.jpg - http://www.supremenewyork.com/shop/303455
Classic Script Hooded Sweatshirt( Yellow - Medium ) - http://assets.supremenewyork.com/156738/sm/sDr4Bi5w3bU.jpg - http://www.supremenewyork.com/shop/303512
Cordura® S Logo 6-Panel( Black - N/A ) - http://assets.supremenewyork.com/156721/sm/OsCNYeO_y4U.jpg - http://www.supremenewyork.com/shop/303511
Vertical Logo Baseball Jersey( Black - Medium ) - http://assets.supremenewyork.com/156286/sm/PIyVb6Gwgrk.jpg - http://www.supremenewyork.com/shop/303463
Perforated Leather Hooded Sweatshirt( Black  - Medium ) - http://assets.supremenewyork.com/156740/sm/GnenfJ06zQg.jpg - http://www.supremenewyork.com/shop/303513
Bedroom Tee( Bright Blue - Large ) - http://assets.supremenewyork.com/156682/sm/ZHITQZ65f1I.jpg - http://www.supremenewyork.com/shop/303507
Fuck You Tee( Black - Large ) - http://assets.supremenewyork.com/156653/sm/Hbytan_5dmM.jpg - http://www.supremenewyork.com/shop/303504

but I feel like it would be way too much and might be harder to create a monitor out if it. So I wonder how can I do all these into one single list, meaning that

 Cutouts Tee( Terra Cotta - XLarge ) - http://assets.supremenewyork.com/156668/sm/laJkUkh_sRA.jpg - http://www.supremenewyork.com/shop/303505

all that would be into one single list etc. This might not be the best solution so if you have any other! feel free!

1 个答案:

答案 0 :(得分:2)

Try to find a container element that contains the name, color, etc. for the given item, and then find the properties you need by searching among its child elements.

For example, for the page you are trying to scrape, it might be div.restock-item:

for item in bs4.findAll('div', {class: 'restock-item'}):
    # Filter away advertisements, which are also wrapped in `restock-item`:
    if item.find('div', {class: 'user-detail'}):
        name = item.find('h5', {'class': 'handle restock-name'}).string
        color = item.find('h6', {'class': 'restock-colorway'}).string
        # fetch thumbnails, etc. in the same fashion
        print(name + color + ...)