如何使用python获取这些值

时间:2017-05-16 17:46:42

标签: python

我有一些html资源喜欢:

<tr>
    <td class="upl">XXXX</td>
    <td class="upl">XXXX</td>
    <td class="upl">XXXX</td>
    <td class="up">XXXX</td>
    <td>9.09</td>
    <td class="upl">XXXX</td>
    <td class="dn">XXXX</td>
    <td>XXXX</td>
    <td>XXXX</td>
    <td>XXXX</td>
    <td class="up">XXXX</td>
    <td class="up">XXXX</td>
    <td class="up">XXXX</td>
    <td class="dn">XXXX</td>
    <td class="up">XXXX</td>
</tr>
<tr>

    <td class="upl">XXXX</td>
    <td class="upl">XXXX</td>
    <td class="upl">XXXX</td>
    <td class="up">XXXX</td>
    <td>XXXX</td>
    <td class="upl">XXXX</td>
    <td class="up">XXXX</td>
    <td>XXXX</td>
    <td>XXXX</td>
    <td>XXXX</td>
    <td class="up">XXXX</td>
    <td class="up">XXXX</td>
    <td class="up">XXXX</td>
    <td class="dn">XXXX</td>
    <td class="up">XXXX</td>
</tr>

如何使用BeautifulSoup 4获取所有XXXX值?我目前的代码:

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("URL")
bsObj = BeautifulSoup(html, "html.parser")

nameList2 = bsObj.findAll("td")-->this only show all information

for name in nameList2:
    print(name.get_text())

1 个答案:

答案 0 :(得分:0)

BeautifulSoup.contents

BeautifulSoup有一个属性内容(注意:没有方法)可以立即使用get_text():

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("URL")
bsObj = BeautifulSoup(html)
nameList2 = bsObj.findAll("td")   #this only show all information
for name in nameList2:
    print(name.contents) #will be a list like [u:"XXXX"]

在我的测试期间,列表的长度始终为1,因此您可以使用:

    print(name.contents[0]) #will be u:"XXXX"

去除你的u:调用u类的__str__方法:

    print(str(name.contents[0])) # --> "XXXX"

希望无论你做什么,这都很有帮助 祝你好运