从输出中删除字符

时间:2013-09-04 18:11:35

标签: python regex beautifulsoup

我有以下结构由bs4,python。

生成
['Y10765227', '9884877926, 9283183326', '', 'Dealer', 'Rgmuthu']
['L10038779', '9551154555', ',', ',']
['R10831945', '9150000747, 9282109134, 9043728565', ',', ',']
['B10750123', '9952946340', '', 'Dealer', 'Bala']
['R10763559', '9841280752, 9884797013', '', 'Dealer', 'Senthil']

我想要删除角色,我应该得到类似下面的内容

9884877926, 9283183326, Dealer, Rgmuthu
9551154555
9150000747, 9282109134, 9043728565
9952946340 , Dealer, Bala
9841280752, 9884797013, Dealer, Senthil

我正在使用print re.findall("'([a-zA-Z0-9,\s]*)'", eachproperty['onclick'])

所以基本上我想删除“[]”和“''”和“,”以及开头的随机ID。

更新

onclick="try{appendPropertyPosition(this,'Y10765227','9884877926, 9283183326','','Dealer','Rgmuthu');jsb9onUnloadTracking();jsevt.stopBubble(event);}catch(e){};"

所以我从这个onclick属性中抓取来获取上面提到的数据。

1 个答案:

答案 0 :(得分:2)

您可以在此处使用str.joinstr.translate的组合:

>>> from string import punctuation, whitespace
>>> lis = [['Y10765227', '9884877926, 9283183326', '', 'Dealer', 'Rgmuthu'],
['L10038779', '9551154555', ',', ','],['R10831945', '9150000747, 9282109134, 9043728565', ',', ','],
['B10750123', '9952946340', '', 'Dealer', 'Bala'],
['R10763559', '9841280752, 9884797013', '', 'Dealer', 'Senthil']]
for item in lis:
    print ", ".join(x for x in item[1:] 
                                 if x.translate(None, punctuation + whitespace))
...     
9884877926, 9283183326, Dealer, Rgmuthu
9551154555
9150000747, 9282109134, 9043728565
9952946340, Dealer, Bala
9841280752, 9884797013, Dealer, Senthil