我有一些html资源喜欢:
<tr>
<td class="upl">XXXX</td>
<td class="upl">XXXX</td>
<td class="upl">XXXX</td>
<td class="up">XXXX</td>
<td>9.09</td>
<td class="upl">XXXX</td>
<td class="dn">XXXX</td>
<td>XXXX</td>
<td>XXXX</td>
<td>XXXX</td>
<td class="up">XXXX</td>
<td class="up">XXXX</td>
<td class="up">XXXX</td>
<td class="dn">XXXX</td>
<td class="up">XXXX</td>
</tr>
<tr>
<td class="upl">XXXX</td>
<td class="upl">XXXX</td>
<td class="upl">XXXX</td>
<td class="up">XXXX</td>
<td>XXXX</td>
<td class="upl">XXXX</td>
<td class="up">XXXX</td>
<td>XXXX</td>
<td>XXXX</td>
<td>XXXX</td>
<td class="up">XXXX</td>
<td class="up">XXXX</td>
<td class="up">XXXX</td>
<td class="dn">XXXX</td>
<td class="up">XXXX</td>
</tr>
如何使用BeautifulSoup 4获取所有XXXX值?我目前的代码:
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("URL")
bsObj = BeautifulSoup(html, "html.parser")
nameList2 = bsObj.findAll("td")-->this only show all information
for name in nameList2:
print(name.get_text())
答案 0 :(得分:0)
BeautifulSoup有一个属性内容(注意:没有方法)可以立即使用get_text():
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("URL")
bsObj = BeautifulSoup(html)
nameList2 = bsObj.findAll("td") #this only show all information
for name in nameList2:
print(name.contents) #will be a list like [u:"XXXX"]
在我的测试期间,列表的长度始终为1,因此您可以使用:
print(name.contents[0]) #will be u:"XXXX"
去除你的u:调用u类的__str__
方法:
print(str(name.contents[0])) # --> "XXXX"
希望无论你做什么,这都很有帮助 祝你好运