我想打印CVE-ID" CVE-2013-2566"和" CVE-2015-2808"在参考文献和" tcp 23"这对应于未加密的telnet服务器使用美丽的汤。无法想到这样的逻辑。
<div xmlns="" style="box-sizing: border-box; width: 100%; margin: 0 0 10px 0; padding: 5px 10px; background: #fdc431; font-weight: bold; font-size: 14px; line-height: 20px; color: #fff;">42263 - Unencrypted Telnet Server</div>
<div xmlns="" style="margin: 0 0 45px 0;">
<div class="details-header">Risk Factor<div class="clear"></div>
</div>
<div style="line-height: 20px; padding: 0 0 20px 0;">Medium<div class="clear"></div>
<div class="details-header">Plugin Information: <div class="clear"></div>
</div>
<div style="line-height: 20px; padding: 0 0 20px 0;">Published: 2009/10/27, Modified: 2015/10/21<div class="clear"></div>
</div>
<div class="details-header">**References**<div class="clear"></div>
</div>
<div id="idm8894160" style="display: block;" class="table-wrapper see-also">
<table cellpadding="0" cellspacing="0">
<thead><tr>
<th width="15%"></th>
<th width="85%"></th>
</tr></thead>
<tbody>
<tr class="">
<td class="#ffffff">CVE</td>
<td class="#ffffff"><a href="http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2013-2566" target="_blank">CVE-2013-2566</a></td>
</tr>
<tr class="">
<td class="#ffffff">CVE</td>
<td class="#ffffff"><a href="http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2015-2808" target="_blank">CVE-2015-2808</a></td>
</tr>
</tbody>
<div class="details-header">Plugin Output<div class="clear"></div>
</div>
<h2>tcp/23</h2>
这就是我所写的内容,我被困在我发表评论的地方。 我是bs4的初学者,所以请耐心等待,明天我必须提交报告,请帮助。
from bs4 import BeautifulSoup
import csv
import urllib.request as urllib2
with open(r"C:\Users\sourabhk076\Documents\CHIDRMUM_DR8016CHI1_CTSINWDB01_9xtqpj.html") as fp:
soup = BeautifulSoup(fp.read(), 'html.parser')
f = csv.writer(open("Report.csv", "w"))
f.writerow(["Observation", "Port", "CVE-ID"])
medium = soup.find_all('div', attrs={'style':'box-sizing: border-box; width: 100%; margin: 0 0 10px 0; padding: 5px 10px; background: #fdc431; font-weight: bold; font-size: 14px; line-height: 20px; color: #fff;'})
####this will search for text "Unencrypted telnet server"####
for x in medium:
port = x.find('h2')
cve = x.find('div', class_='table-wrapper see-also').findAll('tr')
######## don't know what to do next #############
obsv = x.text
portd = port.text
print([obsv,portd,cve])
&#13;
答案 0 :(得分:0)
您可以在标签中搜索子标签。也许像
这样的东西tbody = cve.find("tbody")
for row in tbody.find_all("tr"):
print row.find_all("td")[1].text
答案 1 :(得分:0)
<强>代码:强>
from bs4 import BeautifulSoup
with open('/path/to/some.html') as f:
soup = BeautifulSoup(f.read(), 'html.parser')
service = soup.find('div', style='box-sizing: border-box; width: 100%; margin: 0 0 10px 0; padding: 5px 10px; background: #fdc431; font-weight: bold; font-size: 14px; line-height: 20px; color: #fff;').get_text(strip=True)
cve_ids = [cve_elem.text for cve_elem in soup.select('table > tbody > tr > td > a')]
protocol, port = soup.select_one('table > h2').text.split('/')
print('{}, {}/{}, CVE-IDs: {}'.format(service, protocol, port, cve_ids))
<强>输出:强>
42263 - Unencrypted Telnet Server, tcp/23, CVE-IDs: ['CVE-2013-2566', 'CVE-2015-2808']
请注意与select()
一起使用的CSS selectors
的使用情况。我还使用了>
,child combinator
。
子组合子(&gt;)位于两个CSS选择器之间。它 仅匹配第二个选择器匹配的元素 元素的子元素与第一元素相匹配。