在BS4中使用find_all将文本作为列表获取

时间:2017-03-07 15:28:48

标签: python beautifulsoup bs4 discord

我首先要说我是Python的新手。我一直在用Discord.py和Beautiful Soup建立一个Discord机器人4.我在这里:

mvn groupid:artifactid:goal@id

这是输出: http://puu.sh/uycBF/1efe173437.png

现在,我尝试以多种不同方式使用@commands.command(hidden=True) async def roster(self): """Gets a list of CD's members""" url = "http://www.clandestine.pw/roster.html" async with aiohttp.get(url) as response: soupObject = BeautifulSoup(await response.text(), "html.parser") try: text = soupObject.find_all("font", attrs={'size': '4'}) await self.bot.say(text) except: await self.bot.say("Not found!") 从此代码中删除括号和HTML标记,但每次都会抛出错误。我怎么能够实现这个或将这些数据输出到数组或列表然后只打印纯文本?

2 个答案:

答案 0 :(得分:0)

您将从BeautifulSoup返回Tags的列表,您正在查看的括号来自列表对象。

将它们作为字符串列表返回:

 text = [Member.get_text().encode("utf-8").strip() for Member in soup.find_all("font", attrs={'size': '4'}) if not Member.get_text().encode("utf-8").startswith("\xe2")]

或单个字符串:

text = ",".join([Member.get_text().encode("utf-8") for Member in soup.find_all("font", attrs={'size': '4'}) if not Member.get_text().encode("utf-8").startswith("\xe2")])

答案 1 :(得分:0)

替换

all_font_tags = soupObject.find_all("font", attrs={'size': '4'})
list_of_inner_text = [x.text for x in all_font_tags]
# If you want to print the text as a comma separated string
text = ', '.join(list_of_inner_text)

用这个:

rm -r ~/.ivy2/cache