编辑

Question

我在网页上获得了一些符合特定条件的美丽汤的链接。这是该代码，

    url = 'http://www.somesite.com/stats/'
    soup = BeautifulSoup(r.content, 'html.parser')
    links_list = soup.find_all('a', attrs={'class': 'stats'}, href=True)

链接列表是大约10个不同链接和带有相同html标签的文本的列表。我在列表中有一些单词想与这些链接的文本进行对照。基本上，我正在尝试查看listt列表的所有元素是否存在于links_list列表的元素的html标签之间的字符串中。

这里是一个例子。

listt = ['big', 'letters']
for link in links_list[:]:
    for word in listt:
        if word not in link.get_text().lower():
            links_list.remove(link)

我认为这是一条正确的路线，因为我正在遍历列表的副本。我遇到的所有资源都说过要创建列表的副本并进行遍历。我虽然收到以下错误。

  File "src\stats_finder.py", line 59, in find_item
    links_list.remove(link)
ValueError: list.remove(x): x not in list

在我的情况下，我想留下一个链接，该链接包含html标签之间的文本中的所有关键字。我是要以错误的方式进行操作，还是可能有更有效的方法？我考虑使用all（），但也无法在其中设计解决方案。

Answer 1

我也遇到过类似的问题

/tmp/ff658860-cc0f-11e8-bd7d-178b6a853dfe.png\' @ error/png.c/ReadPNGImage/3927.\nconvert: no images defined

Python: ValueError: list.remove(x): x not in list

Answer 2

使用all功能可以更简单地完成此操作。

listt = ['big', 'letters']
links_set = set(['hello', 'hi', 'big', 'cccc', 'letters', 'anotherword'])

all_are_present = all([word in links_set for word in listt]) # True

编辑

我认为您要尝试的是检查listt中的每个单词是否在所有html元素的文本字符串中，在这种情况下，应为：

listt = ['big', 'letters']
links_text_list = ['hello letters', 'big hi letters', 'big superman letters']

all_are_present = all([word in text for word in listt for text in links_text_list]) # False because "hello letters" doesn't have big

但是，由于只希望包含listt中所有单词的链接，因此可以使用filter函数。

links_with_all_words = list(filter(lambda text: all([word in text for word in listt]), links_text_list))
print(links_with_all_words) # ['big hi letters', 'big superman letters']

如何查看列表的所有元素是否包含在字符串中

2 个答案:

编辑