我正在提取页面正文中出现的所有<ul>
标记,并立即连接它们之前的<p>
标记。
text = BeautifulSoup(requests.get('http://www.getspokal.com/how-to-create-content-based-on-your-customers-pain-points/', timeout=7.00).text)
我使用带有美丽汤的功能来拉出适当的标签:
def funct(tag):
return tag.name == 'ul' and not tag.attrs and not tag.li.attrs and not tag.a
ul_tags = text.find_all(funct)
这会拉出三个<ul>
标签。现在找到紧跟在每个<p>
标记之前的<ul>
标记并连接:
combined = [(ul.find_previous("p") + ul) for ul in ul_tags]
这会产生一个读取
的错误TypeError: unsupported operand type(s) for +: 'Tag' and 'Tag'
其中一个结果应该是:
<p>For example, if you’re in the pet food industry, you might ask your existing customers:</p<ul><li>What challenges do you face on a regular basis with regards your pets?</li><li>Are there any underlying health issues that your pets have that causes you concern?</li><li>What is your biggest struggle when choosing appropriate food for your pet? </li></ul>
列表理解在哪里出错?
答案 0 :(得分:3)
您应该将列表理解更改为:
combined = [(str(ul.find_previous("p")) + str(ul)) for ul in ul_tags]
问题是ul
不是字符串,实际上是bs4.element.Tag
,因此您必须先将其转换。