我正在尝试使用“”作为className,然后是p标记来提取div中的数据。 我的HTML看起来像这样
<div class=""><p>I've been with USAA since 1981 - they've been a good, helpful company and easy to deal with except with making payments on their website. Every time I try to make a payment the website has a problem and I end up calling them. Today, I tried to make a credit card update (same account, different exp. date and code) before I made a payment. The website kept telling me it wouldn't accept the information.</p><p>I called the company to make the payment and was told the system had accepted the information but I couldn't make the payment until tomorrow because of the update. They refused to let me make my payment by phone. 4 times in the past 2 years it wouldn't accept my password, even after I confirmed it by - yes calling in. Other payments have not been accepted for unknown reasons - I've had to call them in. No point having a website if it doesn't work. I avoid calling because it takes so many steps to reach a live person. It's a minor complaint but it happens every time.</p></div></div>
我正在使用Beautifulsoup和我的代码来扩展这些数据:
reviewAllList = [row.text for row in soup.find_all('div',attrs={"class" : ""})]
但是,我无法从同一数据中提取正确的数据。我错过了什么吗?我使用的是Python 3.5。
答案 0 :(得分:0)
只能通过说明来打印文本。
sometxt = <div class=""><p>I've been with USAA since 1981 - they've been a good, helpful company and easy to deal with except with making payments on their website. Every time I try to make a payment the website has a problem and I end up calling them. Today, I tried to make a credit card update (same account, different exp. date and code) before I made a payment. The website kept telling me it wouldn't accept the information.</p><p>I called the company to make the payment and was told the system had accepted the information but I couldn't make the payment until tomorrow because of the update. They refused to let me make my payment by phone. 4 times in the past 2 years it wouldn't accept my password, even after I confirmed it by - yes calling in. Other payments have not been accepted for unknown reasons - I've had to call them in. No point having a website if it doesn't work. I avoid calling because it takes so many steps to reach a live person. It's a minor complaint but it happens every time.</p></div></div>
现在只是print(sometxt.text)
如果你只是在寻找div class =&gt; &#34;&#34; &LT;
你可以通过print(sometxt['class'])
打印它,记住你可能需要通过你的findAll用for循环来做这个。(如果有多个类)
**row.text**
答案 1 :(得分:0)
我假设您只想从段落中获取文字。
您可以执行以下操作:
mydiv = soup.find("div", { "class" : "" })
for p in mydiv.find_all('p'):
text_list.append(p.get_text())
或
mydiv = soup.find("div", { "class" : "" })
text = mydiv.find('p').get_text()
现在无法测试,但根据我对BS的经验,这应该可以正常工作。
编辑:测试并纠正它。
答案 2 :(得分:0)
使用lambda查找具有空类属性的所有div,第一个孩子是p
rows = [str(row.get_text(strip=True)) for row in soup.find_all(lambda tag: tag.name == "div" and ("class" not in tag.attrs or not len(" ".join(tag["class"]))) and tag.findChildren()[0].name == "p")]