Question

我通过寻找标签来提取用beautifulsoup抓取的页面的“标题”：

title = [text.find_all('h1', {'class', 'entry-title'}) for text in texts]

输出是带有以下内容的列表：

[[<h1 class="entry-title">Receita de pão caseiro fácil para iniciantes</h1>],
 [<h1 class="entry-title">Pão branco com fermentação natural</h1>],... etc]

我要从列表中删除

我该怎么办？

Answer 1

您可以通过使用extract（）或decompose（）函数来做到这一点。

Answer 2

microbenchmark::microbenchmark(
  a = mtcars %>% mutate(),
  b = mtcars %>% mutate(myfun())
)
# Unit: milliseconds
#  expr        min         lq       mean     median        uq        max neval
#     a   1.872101   2.165801   2.531046   2.312051   2.72835   4.861202   100
#     b 546.916301 571.909551 603.528225 589.995251 612.20240 798.707300   100

如何从使用BeautifulSoup抓取的列表中删除标签？

2 个答案: