我们说我有一个这样的列表列表:
lis_ = [['"Fun is the enjoyment of pleasure"\t\t',
'@username det fanns ett utvik med "sabrina without a stitch". acke nothing. @username\t\t','Report by @username - #JeSuisCharlie Movement Leveraged to Distribute DarkComet Malware https://t.co/k9sOEpKjbg\t\t'],
['I just became the mayor of Porta Romana on @username! http://4sq.com/9QROVv\t\t', "RT benturner83 Someone's chucking stuff out of the window of an office on tottenham court road #tcr street evacuated http://t.co/heyOhpb1\t\t", "@username Don't use my family surname for your app ???? http://t.co/1yYLXIO9\t\t"]
]
我想删除每个子列表的链接,所以我尝试使用这个正则表达式:
new_list = re.sub(r'^https?:\/\/.*[\r\n]*', '', tweets, flags=re.MULTILINE)
我使用MULTILINE
标志,因为当我打印list_
时,它看起来像:
[]
[]
[]
...
[]
上述方法的问题在于我明显得到了TypeError: expected string or buffer
我不能将这样的子列表传递给正则表达式。 如何将上述正则表达式应用于list_
?中的一组子列表,以获得类似的内容(即没有任何类型链接的子列表):
[['"Fun is the enjoyment of pleasure"\t\t',
'@username det fanns ett utvik med "sabrina without a stitch". acke nothing. @username\t\t','Report by @username - #JeSuisCharlie Movement Leveraged to Distribute DarkComet Malware'],
['I just became the mayor of Porta Romana on @username! \t\t', "RT benturner83 Someone's chucking stuff out of the window of an office on tottenham court road #tcr street evacuated \t\t", "@username Don't use my family surname for your app ????\t\t"]
]
这可以通过地图完成,还是有其他有效的方法?
先谢谢你们
答案 0 :(得分:1)
您似乎有list
list
个string
s。
在这种情况下,您只需要以正确的方式迭代这些列表:
list_ = [['blablablalba', 'blabalbablbla', 'blablala', 'http://t.co/xSnsnlNyq5'], ['blababllba', 'blabalbla', 'blabalbal'],['http://t.co/xScsklNyq5'], ['blablabla', 'http://t.co/xScsnlNyq3']]
def remove_links(sublist):
return [s for s in sublist if not re.search(r'https?:\/\/.*[\r\n]*', s)]
final_list = map(remove_links, list_)
# [['blablablalba', 'blabalbablbla', 'blablala'], ['blababllba', 'blabalbla', 'blabalbal'], [], ['blablabla']]
如果您想删除之后的任何空子列表:
final_final_list = [l for l in final_list if l]
答案 1 :(得分:1)
您需要使用\b
而不是线锚的开始。
>>> lis_ = [['"Fun is the enjoyment of pleasure"\t\t',
'@username det fanns ett utvik med "sabrina without a stitch". acke nothing. @username\t\t','Report by @username - #JeSuisCharlie Movement Leveraged to Distribute DarkComet Malware https://t.co/k9sOEpKjbg\t\t'],
['I just became the mayor of Porta Romana on @username! http://4sq.com/9QROVv\t\t', "RT benturner83 Someone's chucking stuff out of the window of an office on tottenham court road #tcr street evacuated http://t.co/heyOhpb1\t\t", "@username Don't use my family surname for your app ???? http://t.co/1yYLXIO9\t\t"]
]
>>> [[re.sub(r'\bhttps?:\/\/.*[\r\n]*', '', i)] for x in lis_ for i in x]
[['"Fun is the enjoyment of pleasure"\t\t'], ['@username det fanns ett utvik med "sabrina without a stitch". acke nothing. @username\t\t'], ['Report by @username - #JeSuisCharlie Movement Leveraged to Distribute DarkComet Malware '], ['I just became the mayor of Porta Romana on @username! '], ["RT benturner83 Someone's chucking stuff out of the window of an office on tottenham court road #tcr street evacuated "], ["@username Don't use my family surname for your app ???? "]]
OR
>>> l = []
>>> for i in lis_:
m = []
for j in i:
m.append(re.sub(r'\bhttps?:\/\/.*[\r\n]*', '', j))
l.append(m)
>>> l
[['"Fun is the enjoyment of pleasure"\t\t', '@username det fanns ett utvik med "sabrina without a stitch". acke nothing. @username\t\t', 'Report by @username - #JeSuisCharlie Movement Leveraged to Distribute DarkComet Malware '], ['I just became the mayor of Porta Romana on @username! ', "RT benturner83 Someone's chucking stuff out of the window of an office on tottenham court road #tcr street evacuated ", "@username Don't use my family surname for your app ???? "]]