我正在尝试使用beautifulsoup遍历html,但是看来我无法到达所有元素。这是原始链接(http://china-market-research.blogspot.com/2018/10/why-kid-market-is-booming-in-china.html)
我有这样的html:
<div class="post-body entry-content" id="post-body-2820943256231169701" itemprop="description articleBody">
Why Kid Market is Booming in China ?<br>
<br>
Very simple , look at this video you will get it.<br>
<iframe allow="autoplay; encrypted-media" allowfullscreen="" frameborder="0" height="573" src="https://www.youtube.com/embed/Fg7jIjmLyWs" width="1019"></iframe>
<br>
<br>
<br>
Birth control: a rule, not a pill
China’s two-child policy is having unintended consequences
Reluctant to pay for multiple maternity leaves, companies are choosing not to hire young women
<br>
THE one-child-per-couple policy was horrific for women in China. Many were subjected to forced sterilisations or abortions. Newborn girls were killed, removed by family-planning officials or abandoned by parents desperate that their one permitted baby be a boy. Women from neighbouring countries suffered, too, as victims of human trafficking; a skewed sex-ratio made it more difficult for young men to find Chinese wives. So the government’s announcement in late 2015 that it was relaxing the policy, after 35 years, was good news. Yet the two-child-per-couple policy that replaced it may bring different kinds of problems. source <a href="https://www.economist.com/china/2018/07/26/chinas-two-child-policy-is-having-unintended-consequences">https://www.economist.com/china/2018/07/26/chinas-two-child-policy-is-having-unintended-consequences</a><br>
<blockquote class="tr_bq">
For a generation the government assured women that “one is enough” and that “late marriage and late childbirth are worthy.” Now state media urge them to marry while still in university and remind them that older mothers are more likely to have babies with birth defects, notes Leta Hong Fincher, an author and academic. Officials are encouraging childbirth because they worry that the fertility rate (the number of children a woman can expect to have during her lifetime) has sunk well below 2.1, the level required to keep the population stable in the long term. They fear a shrinking population will hamper economic growth.</blockquote>
<div style="clear: both;"></div>
</div>
我正在尝试到达每个元素,包括那些没有任何标签并且只是纯文本的元素。所以当我进行交流时,我应该看到 这个要素也“中国的二胎政策正在产生意想不到的后果 由于不愿支付多份产假,公司选择不雇用年轻妇女。” 那就是我的代码:
article_soup = BeautifulSoup(article_html, "html.parser")
find_entry_content = article_soup.find('div',class_="post-body entry-content")
for first_parent_tag in find_entry_content.find_all():
print(first_parent_tag)
这是上面代码的结果:
<br/>
<br/>
<br/>
<iframe allow="autoplay; encrypted-media" allowfullscreen="" frameborder="0" height="573" src="https://www.youtube.com/embed/Fg7jIjmLyWs" width="1019"></iframe>
<br/>
<br/>
<br/>
<br/>
<a href="https://www.economist.com/china/2018/07/26/chinas-two-child-policy-is-having-unintended-consequences">https://www.economist.com/china/2018/07/26/chinas-two-child-policy-is-having-unintended-consequences</a>
<br/>
<blockquote class="tr_bq">
For a generation the government assured women that “one is enough” and that “late marriage and late childbirth are worthy.” Now state media urge them to marry while still in university and remind them that older mothers are more likely to have babies with birth defects, notes Leta Hong Fincher, an author and academic. Officials are encouraging childbirth because they worry that the fertility rate (the number of children a woman can expect to have during her lifetime) has sunk well below 2.1, the level required to keep the population stable in the long term. They fear a shrinking population will hamper economic growth.</blockquote>
<div style="clear: both;"></div>
答案 0 :(得分:3)
find_all()循环遍历div中的所有标记/子代。当您要查找的文本位于div中时,它不在标签/子标签下。
<div>
"Some text" # <----- This will be skipped because it isn't a HTML (child) tag in the Div. It's right in the div
"Some more text" # <----- This will also be skipped for the same reason.
<br/>
<iframe allow= .... >
<br/>
<br/>
<a href ....>
<br/>
<blockquote class="tr_bq">
For a generation the government assured women that .... </blockquote>
# ^ This text is found because it's in a blockquote ,which find_all() is looping over'
<div style="clear: both;"></div>
因此,不要遍历div中的所有标签,而要查看div本身。它应该包含所有不在div子级中的元素。