打印时间距均匀

时间:2014-05-09 10:58:55

标签: python string beautifulsoup

我有这段代码:

for t in tables:
    print ""
    my_table = t
    rows = my_table.findAll('tr')
    for tr in rows:
      cols = tr.findAll('td')
      i = 0
      for td in cols:
          text = str(td.text).strip()
          print "{}{}".format(text if text !="" else "IP","|"),
          i=i+1
          if i == 2:
            print ""
            i = 0
            pass

"表"是HTML格式的表格列表。我正在使用beautifulsoup来解析它们。

目前,我得到的输出是:

Interface in| port-channel8.53| 
IP| 172.18.153.126/255.255.255.252| 
Router| bob| 
Route| route: 192.168.178.0/255.255.128.0 gw 172.18.145.106| 
Interface out| Ethernet2/5.103| 
IP| 172.18.145.105/255.255.255.252| 

我想得到的是:

Interface in | port-channel8.53                                    | 
IP           | 172.18.153.126/255.255.255.252                      |  
Router       | bob                                                 |  
Route        | route: 192.168.178.0/255.255.128.0 gw 172.18.145.106| 
Interface out| Ethernet2/5.103                                     | 
IP           | 172.18.145.105/255.255.255.252                      |
"Placeholder"| another ip in the same td as the one up             |
"Placeholder"| another ip in the same td as the one up             |

如何获得此输出?

修改

以下是1个表的制作方法:

<table>
<tr>
    <td>Interface in</td>
    <td>Vlan800 (bob)</td>
</tr>
<tr>
    <td></td>
    <td>172.26.128.3/255.255.255.224<br></br></td>
</tr>
<tr>
    <td>Router</td>
    <td>bob2</td>
</tr>
<tr>
    <td>Route</td>
    <td>route: 0.0.0.0/0.0.0.0 gw 172.26.144.241</td>
</tr>
<tr>
    <td>Interface out</td>
    <td>Vlan1145 (bob3)</td>
</tr>
<tr>
    <td></td>
    <td>172.26.144.245/255.255.255.240<br></br></td>
</tr>
</table>

(是的,空白在真实页面上)

EDIT2: 有问题的代码:

<td>
195.233.112.4/255.255.255.0<br>
195.233.112.15/255.255.255.0<br>
195.233.112.3/255.255.255.0<br>
<br><br><br></td>

编辑3:

示例代码2(会产生解决方案的问题)

<table class="nitrestable">
<tr>
    <td>Interface in</td>
    <td>GigabitEthernet1/1.103 (*global)</td>
</tr>
<tr>
    <td></td>
    <td>172.18.145.106/255.255.255.252<br></br></td>
</tr>
<tr>
    <td>Router</td>
    <td>*grt</td>
</tr>
<tr>
    <td>Route</td>
    <td>route: 172.18.145.106/255.255.255.128 gw 172.18.145.106</td>
</tr>
<tr>
    <td>Interface out</td>
    <td>Vlan71 (*global)</td>
</tr>
<tr>
    <td></td>
    <td>172.18.145.106/255.255.255.0<br>
        172.18.146.106/255.255.255.0<br>
        172.18.147.106/255.255.255.0<br></br></br></br></td></tr>
</table>

3 个答案:

答案 0 :(得分:1)

您可以提供format specifier,例如

print "{0:14}|".format(text or "IP"),

或使用str.ljust填充您传递给format的字符串:

print "{}|".format(str.ljust(text or "IP", 14)),

然而,(正如dilbert刚刚在评论中指出的那样),你需要做一些事情来计算每列所需的大小。

请注意,由于空字符串""在布尔上下文中评估False,您可以简化if条件,并且由于管道'|'永远不会更改,您可以直接把它放在模板中。

答案 1 :(得分:0)

它有助于将行/列解析为列表,然后对其进行评估。这样可以很容易地计算列的最大宽度(代码中的w1,w2)。 正如其他人所说,一旦确定了宽度,str.format()就是你想要的。

for t in tables:
    col = [[],[]]
    my_table = t
    rows = my_table.findAll('tr')
    for tr in rows:
      cols = tr.findAll('td')
      i = 0
      for td in cols:
          text = str(td.text).strip()
          col[i].append(text if text else "IP")
          i=i+1
          if i == 2:
            if '<br>' in text:
                text = text.replace('</br>','') #ignore </br>
                for t in text.split('<br>')[1:]: #first element has already been processed
                    if t: #only append if there is content
                        col[0].append(col[0][-1])  #duplicate the last entry of col[0]
                        col[1].append(t)
             i = 0
    w1 = max([len(x) for x in col[0]])
    w2 = max([len(x) for x in col[1]])
    for i in range(len(col[1]))
        s='{: <{}}|{: <{}}|'.format(col[0][i],w1,col[1][i],w2)
        print(s)

要解释str.format():'{: <{}}'.format(x,y)从文本y创建一个空格填充的左侧调整后的字符串,其宽度为x

编辑:添加了多个IP /任何字段的附加解析,第二个列用<br>分隔

答案 2 :(得分:0)

这是一个更简单的&#39;脚本。在Python中查找enumerate关键字。

import BeautifulSoup

raw_str = \
'''
<table>
<tr>
    <td>Interface in</td>
    <td>Vlan800 (bob)</td>
</tr>
<tr>
    <td></td>
    <td>172.26.128.3/255.255.255.224<br></br></td>
</tr>
<tr>
    <td>Router</td>
    <td>bob2</td>
</tr>
<tr>
    <td>Route</td>
    <td>route: 0.0.0.0/0.0.0.0 gw 172.26.144.241</td>
</tr>
<tr>
    <td>Interface out</td>
    <td>Vlan1145 (bob3)</td>
</tr>
<tr>
    <td></td>
    <td>172.26.144.245/255.255.255.240<br></br></td>
</tr>
</table>
'''

org_str = \
'''
Interface in| port-channel8.53| 
IP| 172.18.153.126/255.255.255.252| 
Router| bob| 
Route| route: 192.168.178.0/255.255.128.0 gw 172.18.145.106| 
Interface out| Ethernet2/5.103| 
IP| 172.18.145.105/255.255.255.252| 
'''

print org_str

soup = BeautifulSoup.BeautifulSoup(raw_str)

tables = soup.findAll('table')
for cur_table in tables:
    print ""
    col_sizes = {}

    # Figure out the column sizes
    for tr in cur_table.findAll('tr'):
        tds = tr.findAll('td')
        cur_col_sizes = {col : max(len(td.text), col_sizes.get(col, 0)) for (col, td) in enumerate(tds)}
        col_sizes.update(cur_col_sizes)

    # Print the data, padded using the detected column sizes
    for tr in cur_table.findAll('tr'):
        tds = tr.findAll('td')
        line_strs = [("%%-%ds" % col_sizes[col]) % (td.text or "IP") for (col, td) in enumerate(tds)]
        line_str  = "| %s |" % " | ".join(line_strs)
        print line_str