如何使用BeautifulSoup删除目标tr块

时间:2019-05-03 11:24:30

标签: python web-scraping beautifulsoup

我想用文本删除目标tr块,当我运行它时,我得到了完美的输出,但是有一个问题,我看到它报废了<tr><td>Domain</td><td>Last Resolved Date</td></tr>,但实际上我不想在输出中显示此行,所以我可以删除它吗?代码如下

已修复

旧代码

import requests
from bs4 import BeautifulSoup


headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
url = "https://viewdns.info/reverseip/?host=github.com&t=1"
text = requests.get(url, headers=headers).text
soup = BeautifulSoup(text, 'html.parser')

table = soup.find('table', attrs={'border':'1'})
domain = table.findAll('td', attrs={'align':None})

for line in domain:
    print(line.text)

已修复

import requests
from bs4 import BeautifulSoup


headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
url = "https://viewdns.info/reverseip/?host=github.com&t=1"
text = requests.get(url, headers=headers).text
soup = BeautifulSoup(text, 'html.parser')

table = soup.find('table', attrs={'border':'1'})
domain = table.findAll('td', attrs={'align':None})[2:]

for line in domain:
    print(line.text)

2 个答案:

答案 0 :(得分:0)

过滤掉domain变量中的前两个对象:

domain = table.findAll('td', attrs={'align':None})[2:]

答案 1 :(得分:0)

尝试输入代码。

import requests
from bs4 import BeautifulSoup


headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
url = "https://viewdns.info/reverseip/?host=github.com&t=1"
text = requests.get(url, headers=headers).text
soup = BeautifulSoup(text, 'html.parser')

table = soup.find('table', attrs={'border':'1'})
domain = table.findAll('td', attrs={'align':None})[2:]

for line in domain:
    print(line.text)