如何彻底遍历循环

时间:2021-03-25 20:25:00

标签: python list loops web-scraping

我正在尝试从搜索栏中抓取建议。因此,例如,我搜索 apple,会出现 5 个列出其他水果的建议。我想遍历该建议列表,收集为这 5 种建议水果列出的其他建议。从新的建议列表中,我想检查一下,确保没有重复,然后访问新建议的 url,并继续这样做,直到建议列表中的所有术语都已访问过。

例如,我搜索 apple。假设建议是 [banana, pear, orange, grape, peach]

然后我访问每个建议以获取新建议。所以说

**Suggested Term**           **New Suggestions**
    banana                   [apple, orange, blueberry, strawberry]
    pear                     [peach, plum, grape]
    orange                   [lemon, grapefruit, lime, tangerine]
    grape                    [blueberry, strawberry, blackberry, cherry]
    peach                    [nectarine, plum, apricot]

如您所见,新建议中有重复项。我会检查重复项,删除它们,然后将新的搜索词附加到我想继续迭代的原始列表中。

例如,新建议列表如下new_suggestions = [apple, orange, blueberry, strawberry, peach, plum, grape, lemon, grapefruit, lime, tangerine, blueberry, strawberry, blackberry, cherry, nectarine, plum, apricot]

在与原始建议术语列表进行交叉检查后,我删除 [apple, orange, grape, peach] 以获得 new_suggestions = [blueberry, strawberry, plum, lemon, grapefruit, lime, tangerine, blueberry, strawberry, blackberry, cherry, nectarine, plum, apricot]

然后我删除 new_suggestions 中的重复项以获得: new_suggestions = [blueberry, strawberry, plum, lemon, grapefruit, lime, tangerine, blackberry, cherry, nectarine, apricot]

我将新建议附加到建议术语的原始列表中以获得 [apple, banana, pear, orange, grape, peach, blueberry, strawberry, plum, lemon, grapefruit, lime, tangerine, blackberry, cherry, nectarine, apricot]

我想继续遍历列表,直到访问完所有术语,并且没有更多建议添加到列表中。我该怎么做?

下面是我的代码:

    #get suggestions listed for first search term
    suggestions = driver.find_element_by_xpath('//*[@id="search-associates"]').find_elements_by_tag_name('a')
    for i in suggestions:
        searches +=[i.text]
        urls += [i.get_attribute('href')]
    
    #remove last entry because it is a blank
    urls.pop()


    #iterate through the url of each suggestion
    for i in urls:
        #driver makes new request to each url
        driver.get(i)
        results += [driver.find_element_by_xpath('/html/body/div/div[4]/h2/span').text]

        #since not all urls will have suggestions
        try:
            new_suggestions = driver.find_element_by_xpath('//*[@id="search-associates"]').find_elements_by_tag_name('a')   
            for x in new_suggestions:
                new_searches+=[x.text]
                new_urls += [x.get_attribute('href')]

                #remove duplicates if in original url list
                new_urls = [elem for elem in new_urls if elem not in urls ]

        except:
            pass


    #remove duplicates if in new_urls list          
    for y in new_urls:  
        if y not in newest_urls:
            newest_urls.append(y)


    #remove blanks
    newest_urls = [x for x in newest_urls if x != None]

    #add newest urls to original url list to keep iterating through
    urls.extend(newest_urls)

感谢您花时间查看我的问题并以任何方式提供帮助。我很感激。

0 个答案:

没有答案
相关问题