如何从一列的每一行中提取强标签之间的粗体字母单词?

时间:2019-11-14 14:41:56

标签: python beautifulsoup

我有一个在强标签之间带有粗体字母单词的字符串。在名为notes_0的列中有50983行。这样的行之一是:

'The Candidate Status has now been updated from <strong>CV Submitted</strong> and <strong>Feedback Pending</strong> to <strong>Interview 1</strong> and <strong>Scheduled</strong> with Stage Date 05 July, 2018, 3:30 am IST - UTC +05:30.<br/><br/>Rahul has added the following note : "L1 Scheduled on 07/05/18 at 330 AM IST(6 PM EST)".'

我希望在其他列和相同的行号中,该行应包含单词“ Interview 1”(即“ to”之后的单词)

1 个答案:

答案 0 :(得分:2)

您的问题不是很清楚,但是我将提供您提供的信息。以下脚本将为您提供所需的单词:

from bs4 import BeautifulSoup

doc = 'The Candidate Status has now been updated from <strong>CV Submitted</strong> and <strong>Feedback Pending</strong> to <strong>Interview 1</strong> and <strong>Scheduled</strong> with Stage Date 05 July, 2018, 3:30 am IST - UTC +05:30.<br/><br/>Rahul has added the following note : "L1 Scheduled on 07/05/18 at 330 AM IST(6 PM EST)".'

soup = BeautifulSoup(doc, 'html.parser')
bold_words = soup.find_all('strong')
print(bold_words[2].text)

一些评论:

  • 您没有说明“粗体”标签的数量是否稳定。因此盲目访问索引2很困难
  • 这个示例几乎直接从BeautifulSoup文档的首页闪闪发光。而且您没有显示任何您已经尝试过的代码。这使我相信您没有阅读文档,而是立即来索取快速解答。将来,至少要显示您现有的代码,并显示错误消息。