使用Class of div =“”从'div'中提取'p'数据

时间:2017-03-30 12:01:03

标签: python-3.x beautifulsoup

我正在尝试使用“”作为className,然后是p标记来提取div中的数据。 我的HTML看起来像这样

<div class=""><p>I've been with USAA since 1981 - they've been a good, helpful company and easy to deal with except with making payments on their website. Every time I try to make a payment the website has a problem and I end up calling them. Today, I tried to make a credit card update (same account, different exp. date and code) before I made a payment. The website kept telling me it wouldn't accept the information.</p><p>I called the company to make the payment and was told the system had accepted the information but I couldn't make the payment until tomorrow because of the update. They refused to let me make my payment by phone. 4 times in the past 2 years it wouldn't accept my password, even after I confirmed it by - yes calling in. Other payments have not been accepted for unknown reasons - I've had to call them in. No point having a website if it doesn't work. I avoid calling because it takes so many steps to reach a live person. It's a minor complaint but it happens every time.</p></div></div>

我正在使用Beautifulsoup和我的代码来扩展这些数据:

reviewAllList = [row.text for row in soup.find_all('div',attrs={"class" : ""})]

但是,我无法从同一数据中提取正确的数据。我错过了什么吗?我使用的是Python 3.5。

3 个答案:

答案 0 :(得分:0)

只能通过说明来打印文本。

sometxt = <div class=""><p>I've been with USAA since 1981 - they've been a good, helpful company and easy to deal with except with making payments on their website. Every time I try to make a payment the website has a problem and I end up calling them. Today, I tried to make a credit card update (same account, different exp. date and code) before I made a payment. The website kept telling me it wouldn't accept the information.</p><p>I called the company to make the payment and was told the system had accepted the information but I couldn't make the payment until tomorrow because of the update. They refused to let me make my payment by phone. 4 times in the past 2 years it wouldn't accept my password, even after I confirmed it by - yes calling in. Other payments have not been accepted for unknown reasons - I've had to call them in. No point having a website if it doesn't work. I avoid calling because it takes so many steps to reach a live person. It's a minor complaint but it happens every time.</p></div></div>

现在只是print(sometxt.text) 如果你只是在寻找div class =&gt; &#34;&#34; &LT; 你可以通过print(sometxt['class'])打印它,记住你可能需要通过你的findAll用for循环来做这个。(如果有多个类)

**row.text**

答案 1 :(得分:0)

我假设您只想从段落中获取文字。

您可以执行以下操作:

mydiv = soup.find("div", { "class" : "" })
for p in mydiv.find_all('p'):
    text_list.append(p.get_text())

mydiv = soup.find("div", { "class" : "" })
text = mydiv.find('p').get_text()

现在无法测试,但根据我对BS的经验,这应该可以正常工作。

编辑:测试并纠正它。

答案 2 :(得分:0)

使用lambda查找具有空类属性的所有div,第一个孩子是p

rows = [str(row.get_text(strip=True)) for row in soup.find_all(lambda tag: tag.name == "div" and ("class" not in tag.attrs or not len(" ".join(tag["class"]))) and tag.findChildren()[0].name == "p")]