使用BeautifulSoup从queryset中获取所有项目的列表

时间:2016-10-25 07:09:33

标签: python html regex django beautifulsoup

我有Django项目的字段,内容(来自QuerySet):

<p><b>Name and LastName</b><br />
Work Title<br /><span class="text-spacer"></span>
</p>
<p><b>Name and LastName 1</b><br />
Work Title1 <br /><span class="text-spacer"></span>
</p>
<p><b>Name and LastName 2</b><br />
Work Title 2<br /><span class="text-spacer"></span>
</p>

但我希望以这种格式提供文本,( - ):

Name and LastName - Work Title
Name and LastName 2 - Work Title 2
Name and LastName 3 - Work Title 3

这是我的代码,但我只获得了第一项,但我想要包含项目的数组:

text_list = self.texts.filter(code='ON')
for i in text_list:
    soup = BeautifulSoup(i.text_en, "html.parser")
    aa = soup.p.get_text(separator=" - ", strip=True)
return [aa]

1 个答案:

答案 0 :(得分:1)

您需要遍历p标记。从您提供的示例中,您可以尝试这样:

source = """<p><b>Name and LastName</b><br />
Work Title<br /><span class="text-spacer"></span>
</p>
<p><b>Name and LastName 1</b><br />
Work Title1 <br /><span class="text-spacer"></span>
</p>
<p><b>Name and LastName 2</b><br />
Work Title 2<br /><span class="text-spacer"></span>
</p>
"""
soup = BeautifulSoup(source, 'lxml')
ary = [p.get_text(separator=' - ', strip=True) for p in soup.find_all('p')]

ary将是:

[u'Name and LastName - Work Title',
 u'Name and LastName 1 - Work Title1',
 u'Name and LastName 2 - Work Title 2']