我该如何从标签中抓取80和443

时间:2019-05-20 19:48:44

标签: python

我正在尝试从标签中删除80和443

从bs4中将BeautifulSoup导入为bs

<ul class="ports">
<li><a href="#80">80</a>
</li>
<li><a href="#443">443</a>
</li>
</ul>
<a><div class="state">http</div><a href="http://localhost:80" target="_blank" class="link"><i class="fa fa-mail-forward">&nbsp;
</i></a>

1 个答案:

答案 0 :(得分:0)

# If Your Looking To Parse An .html File

from bs4 import BeautifulSoup
with open('test.html') as html_file:
    soup = BeautifulSoup(html_file, 'html.parser')
    ul = soup.find('ul', {'class', 'ports'})
    a = ul.findAll('a')
    Ports=[]
    for port in a:
        Ports.append(port.string)

# If Your Looking To Parse A Website

from bs4 import BeautifulSoup
import requests
session=requests.session()
endpoint = LINK
response = session.get(endpoint)
soup = BeautifulSoup(response.text, 'html.parser')
ul = soup.find('ul', {'class', 'ports'})
a = ul.findAll('a')
Ports=[]
for port in a:
    Ports.append(port.string)