Question

我是新手。我正试图用桌子刮一个表格。我能用漂亮的汤刮掉整个父母的标签。但我不知道如何遍历子标签并将文本内部传递出来。

这是我的代码

soup = BeautifulSoup(htmltext, "html.parser")
tables = soup.find('td',attrs={'class':'title_heading'})
for table in tables:
    print(table)
    form_name = table.td.center.strong.u.text *--ERROR---*

以上代码打印<td>标记内的所有内容。当我尝试遍历子标记时发生错误。

File "E:\Study_naveen\python\scrape.py", line 23, in <module>
form_name = table.td.center.strong.u.text
AttributeError: 'NoneType' object has no attribute 'center'

这是我的HTML

<td width="615" class="title_heading"><center>
<strong><u> ONLINE REGISTRATION FORM</u></strong>
<br><br>
<strong>Blah<br>
123456789-<br>
blah blah<br>
phone - 123456789
999999999<br>
Email : something@gmail.com.</strong>

我想获得＆＃34;在线注册表格＆＃34;里面的文字。我该怎么做？

Answer 1

html = '''<td width="615" class="title_heading"><center>
<strong><u> ONLINE REGISTRATION FORM</u></strong>
<br><br>
<strong>Blah<br>
123456789-<br>
blah blah<br>
phone - 123456789
999999999<br>
Email : something@gmail.com.</strong>'''
import bs4

soup = bs4.BeautifulSoup(html, 'lxml')
text = soup.find('td', class_="title_heading").find('strong').text
print(text)

出：

 ONLINE REGISTRATION FORM

BeautifulSoup：解析表时出现名称错误

1 个答案: