我有这段代码:
for t in tables:
print ""
my_table = t
rows = my_table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
i = 0
for td in cols:
text = str(td.text).strip()
print "{}{}".format(text if text !="" else "IP","|"),
i=i+1
if i == 2:
print ""
i = 0
pass
"表"是HTML格式的表格列表。我正在使用beautifulsoup来解析它们。
目前,我得到的输出是:
Interface in| port-channel8.53|
IP| 172.18.153.126/255.255.255.252|
Router| bob|
Route| route: 192.168.178.0/255.255.128.0 gw 172.18.145.106|
Interface out| Ethernet2/5.103|
IP| 172.18.145.105/255.255.255.252|
我想得到的是:
Interface in | port-channel8.53 |
IP | 172.18.153.126/255.255.255.252 |
Router | bob |
Route | route: 192.168.178.0/255.255.128.0 gw 172.18.145.106|
Interface out| Ethernet2/5.103 |
IP | 172.18.145.105/255.255.255.252 |
"Placeholder"| another ip in the same td as the one up |
"Placeholder"| another ip in the same td as the one up |
如何获得此输出?
修改
以下是1个表的制作方法:
<table>
<tr>
<td>Interface in</td>
<td>Vlan800 (bob)</td>
</tr>
<tr>
<td></td>
<td>172.26.128.3/255.255.255.224<br></br></td>
</tr>
<tr>
<td>Router</td>
<td>bob2</td>
</tr>
<tr>
<td>Route</td>
<td>route: 0.0.0.0/0.0.0.0 gw 172.26.144.241</td>
</tr>
<tr>
<td>Interface out</td>
<td>Vlan1145 (bob3)</td>
</tr>
<tr>
<td></td>
<td>172.26.144.245/255.255.255.240<br></br></td>
</tr>
</table>
(是的,空白在真实页面上)
EDIT2: 有问题的代码:
<td>
195.233.112.4/255.255.255.0<br>
195.233.112.15/255.255.255.0<br>
195.233.112.3/255.255.255.0<br>
<br><br><br></td>
编辑3:
示例代码2(会产生解决方案的问题)
<table class="nitrestable">
<tr>
<td>Interface in</td>
<td>GigabitEthernet1/1.103 (*global)</td>
</tr>
<tr>
<td></td>
<td>172.18.145.106/255.255.255.252<br></br></td>
</tr>
<tr>
<td>Router</td>
<td>*grt</td>
</tr>
<tr>
<td>Route</td>
<td>route: 172.18.145.106/255.255.255.128 gw 172.18.145.106</td>
</tr>
<tr>
<td>Interface out</td>
<td>Vlan71 (*global)</td>
</tr>
<tr>
<td></td>
<td>172.18.145.106/255.255.255.0<br>
172.18.146.106/255.255.255.0<br>
172.18.147.106/255.255.255.0<br></br></br></br></td></tr>
</table>
答案 0 :(得分:1)
您可以提供format specifier,例如
print "{0:14}|".format(text or "IP"),
或使用str.ljust
填充您传递给format
的字符串:
print "{}|".format(str.ljust(text or "IP", 14)),
然而,(正如dilbert刚刚在评论中指出的那样),你需要做一些事情来计算每列所需的大小。
请注意,由于空字符串""
在布尔上下文中评估False
,您可以简化if
条件,并且由于管道'|'
永远不会更改,您可以直接把它放在模板中。
答案 1 :(得分:0)
它有助于将行/列解析为列表,然后对其进行评估。这样可以很容易地计算列的最大宽度(代码中的w1,w2)。 正如其他人所说,一旦确定了宽度,str.format()就是你想要的。
for t in tables:
col = [[],[]]
my_table = t
rows = my_table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
i = 0
for td in cols:
text = str(td.text).strip()
col[i].append(text if text else "IP")
i=i+1
if i == 2:
if '<br>' in text:
text = text.replace('</br>','') #ignore </br>
for t in text.split('<br>')[1:]: #first element has already been processed
if t: #only append if there is content
col[0].append(col[0][-1]) #duplicate the last entry of col[0]
col[1].append(t)
i = 0
w1 = max([len(x) for x in col[0]])
w2 = max([len(x) for x in col[1]])
for i in range(len(col[1]))
s='{: <{}}|{: <{}}|'.format(col[0][i],w1,col[1][i],w2)
print(s)
要解释str.format():'{: <{}}'.format(x,y)
从文本y
创建一个空格填充的左侧调整后的字符串,其宽度为x
。
编辑:添加了多个IP /任何字段的附加解析,第二个列用<br>
分隔
答案 2 :(得分:0)
这是一个更简单的&#39;脚本。在Python中查找enumerate
关键字。
import BeautifulSoup
raw_str = \
'''
<table>
<tr>
<td>Interface in</td>
<td>Vlan800 (bob)</td>
</tr>
<tr>
<td></td>
<td>172.26.128.3/255.255.255.224<br></br></td>
</tr>
<tr>
<td>Router</td>
<td>bob2</td>
</tr>
<tr>
<td>Route</td>
<td>route: 0.0.0.0/0.0.0.0 gw 172.26.144.241</td>
</tr>
<tr>
<td>Interface out</td>
<td>Vlan1145 (bob3)</td>
</tr>
<tr>
<td></td>
<td>172.26.144.245/255.255.255.240<br></br></td>
</tr>
</table>
'''
org_str = \
'''
Interface in| port-channel8.53|
IP| 172.18.153.126/255.255.255.252|
Router| bob|
Route| route: 192.168.178.0/255.255.128.0 gw 172.18.145.106|
Interface out| Ethernet2/5.103|
IP| 172.18.145.105/255.255.255.252|
'''
print org_str
soup = BeautifulSoup.BeautifulSoup(raw_str)
tables = soup.findAll('table')
for cur_table in tables:
print ""
col_sizes = {}
# Figure out the column sizes
for tr in cur_table.findAll('tr'):
tds = tr.findAll('td')
cur_col_sizes = {col : max(len(td.text), col_sizes.get(col, 0)) for (col, td) in enumerate(tds)}
col_sizes.update(cur_col_sizes)
# Print the data, padded using the detected column sizes
for tr in cur_table.findAll('tr'):
tds = tr.findAll('td')
line_strs = [("%%-%ds" % col_sizes[col]) % (td.text or "IP") for (col, td) in enumerate(tds)]
line_str = "| %s |" % " | ".join(line_strs)
print line_str