如何将其剥离并使用蟒蛇和美丽的汤留下其余部分,td
中的其他项目需要保留
<td style="background:#aaccff" width="50"></td>
<td align="left" style="background:#aaccff" width="150">Device Type</td>
<td align="left" style="background:#aaccff" width="115">IP Address</td>
<td align="left" style="background:#aaccff" width="100">Device Name</td>
<td align="left" style="background:#aaccff" width="215">Notes</td>
<td width="50"></td>
这是完整的代码
<td style="background:#aaccff" width="50"></td>
<td align="left" style="background:#aaccff" width="150">Device Type</td>
<td align="left" style="background:#aaccff" width="115">IP Address</td>
<td align="left" style="background:#aaccff" width="100">Device Name</td>
<td align="left" style="background:#aaccff" width="215">Notes</td>
<td width="50"></td>
<td align="left" width="150">AudioCodes Gateway</td>
<td align="left" width="115">172.31.31.2</td>
<td align="left" width="100"></td>
<td align="left" width="215">FXO</td>
<td style="background:#aaccff" width="50"></td>
<td align="left" style="background:#aaccff" width="150">Device Type</td>
<td align="left" style="background:#aaccff" width="115">IP Address</td>
<td align="left" style="background:#aaccff" width="100">Device Name</td>
<td align="left" style="background:#aaccff" width="215">Notes</td>
<td width="50"></td>
<td align="left" width="150">IC Server</td>
<td align="left" width="115">172.31.56.151</td>
<td align="left" width="100">IND056GIC151</td>
<td align="left" width="215">NAT'd IP = PENDING MPLS, Voice IP = 172.31.52.151</td>
<td width="50"></td>
<td align="left" width="150">IC Server</td>
<td align="left" width="115">172.31.56.152</td>
<td align="left" width="100">IND056GIC152</td>
<td align="left" width="215">NAT'd IP = PENDING MPLS, Voice IP = 172.31.52.152</td>
<td width="50"></td>
<td align="left" width="150">Media Server</td>
<td align="left" width="115">IND1106HMS07</td>
<td align="left" width="100">IND1106HMS07</td>
<td align="left" width="215"></td>
<td width="50"></td>
<td align="left" width="150">Media Server</td>
<td align="left" width="115">IND1106HMS07</td>
<td align="left" width="100">IND1106HMS07</td>
<td align="left" width="215"></td>
这是我到目前为止编码的方式
from ntlm import HTTPNtlmAuthHandler
from bs4 import BeautifulSoup
import requests, os, bleach, urllib2, cookielib
os.system('clear')
user = 'user'
password = "pass"
url = "url"
cookies = cookielib.CookieJar()
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, user, password)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookies),HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman))
pagedata=opener.open(url)
soup=BeautifulSoup(pagedata)
def myfunction(b):
table = b.find('ul', {'class': 'dfwp-column dfwp-list'})
for a in table.findAll('a'):
[a.decompose() for a in table("a")]
for tr in table.findAll('tr'):
for td in tr.findAll('td'):
print td
myfunction(soup)
这是当前的输出
设备类型 IP地址 设备名称 备注
AudioCodes网关 172.31.31.2
FXO
设备类型 IP地址 设备名称 备注
IC Server 172.31.56.151 IND056GIC151 NAT'd IP = PENDING MPLS,语音IP = 172.31.52.151
IC Server 172.31.56.152 IND056GIC152 NAT'd IP = PENDING MPLS,语音IP = 172.31.52.152
媒体服务器 IND1106HMS07 IND1106HMS07
媒体服务器 IND1106HMS07 IND1106HMS07
答案 0 :(得分:1)
一般来说,当人们询问如何去除&#34;对于bs4
的某些内容,他们实际上只是询问如何不将其包含在find
操作中。
您想要排除多余的空格(即带tag.text == ''
的标签)和那四个&#34;列标题&#34;标签。您可以通过CSS选择器执行后者,但前者需要显式过滤。因此,最简单的方法是同时进行这两项工作,并且在我看来更具说服力:
soup = BeautifulSoup(that_long_html_you_gave)
blacklist = {'Device Type','IP Address','Device Name','Notes'}
table = soup.body # to match your variable name. I think.
table.find_all(lambda tag: tag.text and tag.text not in blacklist)
Out[45]:
[<td align="left" width="150">AudioCodes Gateway</td>,
<td align="left" width="115">172.31.31.2</td>,
<td align="left" width="215">FXO</td>,
<td align="left" width="150">IC Server</td>,
<td align="left" width="115">172.31.56.151</td>,
<td align="left" width="100">IND056GIC151</td>,
<td align="left" width="215">NAT'd IP = PENDING MPLS, Voice IP = 172.31.52.151</td>,
<td align="left" width="150">IC Server</td>,
<td align="left" width="115">172.31.56.152</td>,
<td align="left" width="100">IND056GIC152</td>,
<td align="left" width="215">NAT'd IP = PENDING MPLS, Voice IP = 172.31.52.152</td>,
<td align="left" width="150">Media Server</td>,
<td align="left" width="115">IND1106HMS07</td>,
<td align="left" width="100">IND1106HMS07</td>,
<td align="left" width="150">Media Server</td>,
<td align="left" width="115">IND1106HMS07</td>,
<td align="left" width="100">IND1106HMS07</td>]