BeautifulSoup使用字符串/文本获取第一个值

时间:2016-07-28 13:25:56

标签: python beautifulsoup html-parsing

Beautifulsoup在python中进行html解析非常方便,但我遇到问题是要使用干净的代码直接使用"roles" : { "-KMohJaG6djjeBiq3oiV" : { "creationDate" : 1468689365795, "description" : "administrador", "id" : "-KMohJaG6djjeBiq3oiV", "permissions" : { "inputs" : true, "kardex" : true, "outputs" : true, "persons" : true, "product" : false, "rol" : true, "sales" : true, "supplier" : true, "user" : true }, "state" : true }, } <html lang="en"> <head> <title>Receipt</title> </head> <body> <?php $invoicenum = $_POST['invoicenum']; $name = $_POST['name']; $netrev=$invoicenum - 1; ?> Items: <? $itemQuery = mysql_query("SELECT * FROM sales WHERE invoicenum = '$netrev' AND tabname = '$name'"); $result = array(); while($row = mysql_fetch_array($itemQuery)) { $result[] = $row['itemname']; } echo json_encode($result); $amounts = json_decode($result['amounts']); $items = json_decode($result['items']); $prices = json_decode($result['prices']); ?> <br> <? for ($i = 0; $i < count($items); $i++) { echo $amounts[$i] . "x " . $items[$i] . " - " . $prices[$i] . "<br>"; } ?> </body> </html>

获取值
string

结果:

text

如何获得

的结果
from bs4 import BeautifulSoup
tr ="""    
<table>
    <tr><td>text1</td></tr>
    <tr><td>text2<div>abc</div></td></tr>
</table>
"""
table = BeautifulSoup(tr,"html.parser")
for row in table.findAll("tr"):
    td = row.findAll("td")
    print td[0].text
    print td[0].string

我想跳过额外的内部标签

text1 text1 text2abc None text1 text2

一起使用

2 个答案:

答案 0 :(得分:2)

您可以通过设置text.find()参数来简单地使用recursive功能。

for row in table.findAll("tr"):
    td1 = row.td.find(text=True, recursive=False)
    print str(td1)

您的输出为:

text1
text2

无论div标记的位置如何,这都有效。请参阅下面的示例。

>>> tr ="""    
<table>
    <tr><td>text1</td></tr>
    <tr><td>text2<div>abc</div></td></tr>
    <tr><td><div>abc</div>text3</td></tr>
</table>
"""
>>> table = BeautifulSoup(tr,"html.parser")
>>> for row in table.findAll("tr"):
        td1 = row.td.find(text=True, recursive=False)
        print str(td1)


text1
text2
text3

答案 1 :(得分:1)

你可以试试这个:

for row in table.findAll("tr"):
    td = row.findAll("td")
    t = td[0]
    print t.contents[0]

但是只有在