Question

我试图从下面的html表中获取时间。我可以在列表中找到这些表，但是有很多数据

<tr>
<td class="data1">Last update <b class="time">*</b></td>
<td colspan="3">
    <font color="#000000" size="2">10:00 </font><input name="new" type="text" class="myinput"/>
</td>
</tr>

我无法弄清楚如何解析时间数字

import bs4 as bs
import requests

source = requests.get('URL')
soup = bs.BeautifulSoup(source.text,'lxml')

table = soup.table
table_rows = table.find_all('tr')

for tr in table_rows:
    td = tr.find_all('td')
    for i in td:
        row = [i.text]
        print(row)

我试图将时间存储在字符串中，以后再使用

Answer 1

我认为您可以尝试在行中获取颜色为＃000000的字体元素，然后提取时间。

代替此：

for tr in table_rows:
    td = tr.find_all('td')
    for i in td:
        row = [i.text]
        print(row)

尝试一下：

for tr in table_rows:
    times = [time.text for time in tr.find_all('font', {'color':'#000000'})]
    print(times)

希望有帮助！

试图使用漂亮的汤从html表中抓取特定数据

1 个答案: