在python中使用BeautifulSoup读取第二个和更晚的<p>标记

时间:2016-12-27 06:03:42

标签: python-2.7 beautifulsoup

我搜索并尝试了各种选项,以便使用Beautiful Soup读取Python中第一个<p>标记后出现的数据。 HTML如下:

[<div class="user-review">\n<p><strong>THANKS TO MEDIMANAGE\r\n                                 \r\n                            \r\n                                </strong></p>

<p class="lnhgt"> </p><p>Hi Sandeep</p><br/><p>At the onset I would like to thank you very much for your assistance in selecting and purchasing health insurance for my family,</p><br/><p>I am really impressed with the TAT you have maintained and I highly appreciate your advice on which insurance to buy.</p><br/><p>I must say that medimanage has assisted me seamlessly, while I was in the corporate world and today it shows the same interest to help me as an individual.</p><br/><p>Thank you so much and god bless you. May you have all the strength and ability to support many more customers like me.</p><br/><p>Merry christmas and a very happy new year to you and your family.</p><br/>

\n

我使用了beautifulSoup并使用下面提到的代码,我能够读取第一个

标签文本,即“感谢MEDIMANAGE”。

rev_soup = BeautifulSoup(review.read())
for review in rev_soup.find_all("div",class_="user-review"):
    print(review.p.text)

我需要的是阅读p类“lnhgt”中包含的部分,即

  

“嗨Sandeep   一开始,我非常感谢你帮助我为我的家人选择和购买健康保险,   我对你所保持的TAT印象非常深刻,我非常感谢你对购买哪种保险的建议。   我必须说medimanage帮助我无缝地帮助我,而我在公司世界,今天它显示出同样的兴趣,以帮助我个人。   非常感谢你,上帝保佑你。愿你拥有支持像我这样的更多客户的所有力量和能力。   圣诞快乐,祝你和你的家人度过新年快乐“

2 个答案:

答案 0 :(得分:1)

p_tags = soup.find(class_="user-review").find_all('p')
for p in p_tags[1:]: #find all p tag and exclude the first one
    print(p.text)

出:

Hi Sandeep
At the onset I would like to thank you very much for your assistance in selecting and purchasing health insurance for my family,
I am really impressed with the TAT you have maintained and I highly appreciate your advice on which insurance to buy.
I must say that medimanage has assisted me seamlessly, while I was in the corporate world and today it shows the same interest to help me as an individual.
Thank you so much and god bless you. May you have all the strength and ability to support many more customers like me.
Merry christmas and a very happy new year to you and your family.

编辑:

p_tags = soup.find(class_="lnhgt").find_next_siblings('p')
for p in p_tags:
    print(p.text)

答案 1 :(得分:0)

我认为这就是你想要的。

for review in rev_soup.find_all("div",class_="user-review"):
    print(' '.join(i.text for i in review.find_all('p')[1:]))

对于每个review,我找到所有p,然后使用str.join连接除第一个之外的所有class Form(): img = FileField() submit = SubmitField() 。然后打印它。