Question

我正在尝试使用“”作为className，然后是p标记来提取div中的数据。我的HTML看起来像这样

<div class=""><p>I've been with USAA since 1981 - they've been a good, helpful company and easy to deal with except with making payments on their website. Every time I try to make a payment the website has a problem and I end up calling them. Today, I tried to make a credit card update (same account, different exp. date and code) before I made a payment. The website kept telling me it wouldn't accept the information.</p><p>I called the company to make the payment and was told the system had accepted the information but I couldn't make the payment until tomorrow because of the update. They refused to let me make my payment by phone. 4 times in the past 2 years it wouldn't accept my password, even after I confirmed it by - yes calling in. Other payments have not been accepted for unknown reasons - I've had to call them in. No point having a website if it doesn't work. I avoid calling because it takes so many steps to reach a live person. It's a minor complaint but it happens every time.</p></div></div>

我正在使用Beautifulsoup和我的代码来扩展这些数据：

reviewAllList = [row.text for row in soup.find_all('div',attrs={"class" : ""})]

但是，我无法从同一数据中提取正确的数据。我错过了什么吗？我使用的是Python 3.5。

Answer 1

只能通过说明来打印文本。

sometxt = <div class=""><p>I've been with USAA since 1981 - they've been a good, helpful company and easy to deal with except with making payments on their website. Every time I try to make a payment the website has a problem and I end up calling them. Today, I tried to make a credit card update (same account, different exp. date and code) before I made a payment. The website kept telling me it wouldn't accept the information.</p><p>I called the company to make the payment and was told the system had accepted the information but I couldn't make the payment until tomorrow because of the update. They refused to let me make my payment by phone. 4 times in the past 2 years it wouldn't accept my password, even after I confirmed it by - yes calling in. Other payments have not been accepted for unknown reasons - I've had to call them in. No point having a website if it doesn't work. I avoid calling because it takes so many steps to reach a live person. It's a minor complaint but it happens every time.</p></div></div>

现在只是print(sometxt.text) 如果你只是在寻找div class =＆gt; ＆＃34;＆＃34; ＆LT; 你可以通过print(sometxt['class'])打印它，记住你可能需要通过你的findAll用for循环来做这个。（如果有多个类）

**row.text**

Answer 2

我假设您只想从段落中获取文字。

您可以执行以下操作：

mydiv = soup.find("div", { "class" : "" })
for p in mydiv.find_all('p'):
    text_list.append(p.get_text())

或

mydiv = soup.find("div", { "class" : "" })
text = mydiv.find('p').get_text()

现在无法测试，但根据我对BS的经验，这应该可以正常工作。

编辑：测试并纠正它。

Answer 3

使用lambda查找具有空类属性的所有div，第一个孩子是p

rows = [str(row.get_text(strip=True)) for row in soup.find_all(lambda tag: tag.name == "div" and ("class" not in tag.attrs or not len(" ".join(tag["class"]))) and tag.findChildren()[0].name == "p")]

使用Class of div =“”从'div'中提取'p'数据

3 个答案: