Question

我正在使用BeautifulSoup从html数据中的<p>标签中提取文本，其中包含以下代码

for i in data:    
    soup = BeautifulSoup(i, 'html')
    print(' '.join(map(lambda e: e.string, soup.find_all('p'))))

其中data是一个列表，其中每个元素都是包含html代码的字符串。我的问题是，它正在为一些例子工作，但对其他人来说，它给出了

TypeError: sequence item 1: expected string or Unicode, NoneType found

对于上面代码中的第二行。任何人都可以向我解释为什么会这样。或者另外一种检查和跳过会出现此错误的示例的方法？

Answer 1

尝试获取所有包含某些文字的p代码：

' '.join(el.string for el in soup.find_all('p', text=True))