无法打印与特定" div"相对应的值。元素使用美丽的汤

时间:2018-04-05 12:01:11

标签: python html beautifulsoup

我想打印CVE-ID" CVE-2013-2566"和" CVE-2015-2808"在参考文献和" tcp 23"这对应于未加密的telnet服务器使用美丽的汤。无法想到这样的逻辑。

 <div xmlns="" style="box-sizing: border-box; width: 100%; margin: 0 0 10px 0; padding: 5px 10px; background: #fdc431; font-weight: bold; font-size: 14px; line-height: 20px; color: #fff;">42263 - Unencrypted Telnet Server</div>
    <div xmlns="" style="margin: 0 0 45px 0;">
    <div class="details-header">Risk Factor<div class="clear"></div>
    </div>
    <div style="line-height: 20px; padding: 0 0 20px 0;">Medium<div class="clear"></div>
    <div class="details-header">Plugin Information: <div class="clear"></div>
    </div>
    <div style="line-height: 20px; padding: 0 0 20px 0;">Published: 2009/10/27, Modified: 2015/10/21<div class="clear"></div>
    </div>
    <div class="details-header">**References**<div class="clear"></div>
</div>
<div id="idm8894160" style="display: block;" class="table-wrapper see-also">
<table cellpadding="0" cellspacing="0">
<thead><tr>
<th width="15%"></th>
<th width="85%"></th>
</tr></thead>
<tbody>
<tr class="">
<td class="#ffffff">CVE</td>
<td class="#ffffff"><a href="http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2013-2566" target="_blank">CVE-2013-2566</a></td>
</tr>
<tr class="">
<td class="#ffffff">CVE</td>
<td class="#ffffff"><a href="http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2015-2808" target="_blank">CVE-2015-2808</a></td>
</tr>
</tbody>
    <div class="details-header">Plugin Output<div class="clear"></div>
    </div>
    <h2>tcp/23</h2>

这就是我所写的内容,我被困在我发表评论的地方。 我是bs4的初学者,所以请耐心等待,明天我必须提交报告,请帮助。

&#13;
&#13;
from bs4 import BeautifulSoup
import csv
import urllib.request as urllib2

with open(r"C:\Users\sourabhk076\Documents\CHIDRMUM_DR8016CHI1_CTSINWDB01_9xtqpj.html") as fp:
    soup = BeautifulSoup(fp.read(), 'html.parser')

f = csv.writer(open("Report.csv", "w"))
f.writerow(["Observation", "Port", "CVE-ID"])

medium = soup.find_all('div', attrs={'style':'box-sizing: border-box; width: 100%; margin: 0 0 10px 0; padding: 5px 10px; background: #fdc431; font-weight: bold; font-size: 14px; line-height: 20px; color: #fff;'})
####this will search for text "Unencrypted telnet server"####
for x in medium:
    port = x.find('h2')
    cve = x.find('div', class_='table-wrapper see-also').findAll('tr')
    ######## don't know what to do next #############
    obsv = x.text
    portd = port.text
    print([obsv,portd,cve])
&#13;
&#13;
&#13;

2 个答案:

答案 0 :(得分:0)

您可以在标签中搜索子标签。也许像

这样的东西
tbody = cve.find("tbody")
for row in tbody.find_all("tr"):
    print row.find_all("td")[1].text

答案 1 :(得分:0)

<强>代码:

from bs4 import BeautifulSoup

with open('/path/to/some.html') as f:
    soup = BeautifulSoup(f.read(), 'html.parser')

service = soup.find('div', style='box-sizing: border-box; width: 100%; margin: 0 0 10px 0; padding: 5px 10px; background: #fdc431; font-weight: bold; font-size: 14px; line-height: 20px; color: #fff;').get_text(strip=True)
cve_ids = [cve_elem.text for cve_elem in soup.select('table > tbody > tr > td > a')]
protocol, port = soup.select_one('table > h2').text.split('/')
print('{}, {}/{}, CVE-IDs: {}'.format(service, protocol, port, cve_ids))

<强>输出:

42263 - Unencrypted Telnet Server, tcp/23, CVE-IDs: ['CVE-2013-2566', 'CVE-2015-2808']

请注意与select()一起使用的CSS selectors的使用情况。我还使用了>child combinator

  

子组合子(&gt;)位于两个CSS选择器之间。它   仅匹配第二个选择器匹配的元素   元素的子元素与第一元素相匹配。