Question

我老实说发现BeautifulSoup太难了，文档没有解释我正在寻找的基础知识。

我正在尝试在具有属性的标记内返回字符串：

<span class="on">6220</span>

但是运行这个：

def fetch_online():
    users = page('span', {'class' : 'on'})
    return str(users)

给我[<span class="on">6220</span>]。所以我认为我做错了，从标签中获取一个简单的字符串的方法是什么？

Answer 1

你可以这样做：

html = # your HTML source goes here
soup = BeautifulSoup(html)
x = soup.find('span', {'class' : 'on'})
print x.text
print x.string
print x.contents[0]

Answer 2

确实，BeautifulSoup不是那么容易理解，但它有时候太有用了;）

所以，重新考虑一下FlopCoder的例子并再解释一下：

html = # HTML Code #maybe parsed from a website
soup = BeautifulSoup(html) #you create a soup object with your html code
x = soup.find('span', {'class' : 'on'}) #Search for the first span balise in the code, whith class : on
print x.text #Find the found balise, .text mean only the text inside the <>text</>

如果您有多个人需要查找，请执行以下操作：

x = soup.findAll('span', {'class' : 'on'})
for span in x:
    print span.text

最后一个例子使用findAll。它在代码中创建一个包含所有跨度应答的列表，其中Class：On。那么你可以运行一个for。

your_object.text - ＆gt;返回文本

your_object.a - ＆gt;返回链接（依此类推......）

希望它可以帮助一点点！

Answer 3

替换

return str(users)

与

return users[0].string

或

return users[0].contents

page('span ...调用实际上是调用find_all()函数的简写符号，它返回一个列表。因此，您首先索引该列表，获取标记，然后获取其contents。在它上面运行Python str()函数将为您提供全部内容 - 您需要BeautifulSoup函数来获取标记的字符串。

从第一个标签中抓取一个简单的字符串

3 个答案: