在xml的<image:title>标记中找不到关键字的组合

时间:2018-12-11 03:13:43

标签: python beautifulsoup xml-parsing keyword

我想找到其中存在两个关键字的<loc>标签。例如,我想找到一个包含'Yankee' AND '鸭'。代码如下:

<loc>

这是我要获取的xml:

elif len(keywords) == 2:
    keyword1 = keywords[0]
    keyword2 = keywords[1]

    print("Searching for product...")
    keywordLinkFound = False
    while keywordLinkFound is False:
        html = self.driver.page_source
        soup = BeautifulSoup(html, 'lxml')
        try:
            keywordLink = soup.find('image:title', text=re.compile(keyword1 + keyword2)).text
            return keywordLink
        except AttributeError:
            print("Product not found on site, retrying...")
            time.sleep(monitorDelay)
            self.driver.refresh()
        break

1 个答案:

答案 0 :(得分:0)

我会在searching function中进行此操作,因为它可以让您“更多”地控制搜索条件:

def desired_tags(tag):
    text = tag.get_text()

    return tag.name == 'image:title' and \
           'Yankee' in text and 'duck' in text

results = soup.find_all(desired_tags)