我有一个数据框,其中一个列的值中有一些空格(列'地址')。例如: [' 2 47,Philiproad,伦敦,英国' 12 4,Northhall,伦敦,英国']
我的数据中有数千条记录。我怎样才能删除#2;' 2' 2和' 47'例如,使用正则表达式得到以下结果:
[' 247,Philiproad,London,uk' 124,Northhall,London,uk']
答案 0 :(得分:2)
您可以先用空格替换空格,然后在每个逗号后添加一个空格。 我试过这个:
>>> import re
>>> string1 = '2 47, Philip road, London, uk'
>>> regex = re.compile("(\d )", re.S)
>>> regex.sub(lambda x: x.group()[0].replace(" ", ""), string1)
'247, Philip road, London, uk'
答案 1 :(得分:2)
使用regex
:
>>> [re.sub('(?<=\d)+ (?=\d)+', '', ele) for ele in l]
这在正则表达式中使用lookahead
和lookbehind
的概念。
#driver functions:
IN : ['2 47, Philiproad, London, uk', '12 4, Northhall, London, uk']
OUT : ['247, Philiproad, London, uk', '124, Northhall, London, uk']
答案 2 :(得分:1)
已编辑,因此New York
不会转向NewYork
这应该排除address
列(此处我假设您的数据框为df
):
def replace_if_num(s):
no_spaces = s.replace(' ', '')
if no_spaces.isdigit():
return no_spaces
return s
def foo(s):
', '.join(map(replace_if_num, s.split(',')))
df['address'] = df['address'].map(foo)
答案 3 :(得分:1)
已经给出了好的答案,这里有一个没有lambda
或re
的替代方案:
# input list
lst = ['2 47, Philiproad, London, uk', '12 4, Northhall, London, uk']
# remove a space if it exists before the first comma in the element of the lst
result = [a if ' ' not in a.split(',')[0] else a.replace(' ','',1) for a in lst]
print(result)
输出:
['247, Philiproad, London, uk', '124, Northhall, London, uk']