我有Django项目的字段,内容(来自QuerySet):
<p><b>Name and LastName</b><br />
Work Title<br /><span class="text-spacer"></span>
</p>
<p><b>Name and LastName 1</b><br />
Work Title1 <br /><span class="text-spacer"></span>
</p>
<p><b>Name and LastName 2</b><br />
Work Title 2<br /><span class="text-spacer"></span>
</p>
但我希望以这种格式提供文本,( - ):
Name and LastName - Work Title
Name and LastName 2 - Work Title 2
Name and LastName 3 - Work Title 3
这是我的代码,但我只获得了第一项,但我想要包含项目的数组:
text_list = self.texts.filter(code='ON')
for i in text_list:
soup = BeautifulSoup(i.text_en, "html.parser")
aa = soup.p.get_text(separator=" - ", strip=True)
return [aa]
答案 0 :(得分:1)
您需要遍历p
标记。从您提供的示例中,您可以尝试这样:
source = """<p><b>Name and LastName</b><br />
Work Title<br /><span class="text-spacer"></span>
</p>
<p><b>Name and LastName 1</b><br />
Work Title1 <br /><span class="text-spacer"></span>
</p>
<p><b>Name and LastName 2</b><br />
Work Title 2<br /><span class="text-spacer"></span>
</p>
"""
soup = BeautifulSoup(source, 'lxml')
ary = [p.get_text(separator=' - ', strip=True) for p in soup.find_all('p')]
ary
将是:
[u'Name and LastName - Work Title',
u'Name and LastName 1 - Work Title1',
u'Name and LastName 2 - Work Title 2']