Question

我有一个数据框，其中一个列的值中有一些空格（列＆＃39;地址＆＃39;）。例如： [＆＃39; 2 47，Philiproad，伦敦，英国＆＃39; 12 4，Northhall，伦敦，英国＆＃39;]

我的数据中有数千条记录。我怎样才能删除＃2;＆＃39; 2＆＃39; 2和＆＃39; 47＆＃39;例如，使用正则表达式得到以下结果：

[＆＃39; 247，Philiproad，London，uk＆＃39; 124，Northhall，London，uk＆＃39;]

Answer 1

您可以先用空格替换空格，然后在每个逗号后添加一个空格。我试过这个：

>>> import re
>>> string1 = '2 47, Philip road, London, uk'
>>> regex = re.compile("(\d )", re.S)
>>> regex.sub(lambda x: x.group()[0].replace(" ", ""), string1)
'247, Philip road, London, uk'

Answer 2

使用regex：

>>>  [re.sub('(?<=\d)+ (?=\d)+', '', ele) for ele in l]

这在正则表达式中使用lookahead和lookbehind的概念。

#driver functions：

IN : ['2 47, Philiproad, London, uk', '12 4, Northhall, London, uk']
OUT : ['247, Philiproad, London, uk', '124, Northhall, London, uk']

Answer 3

已编辑，因此New York不会转向NewYork

这应该排除address列（此处我假设您的数据框为df）：

def replace_if_num(s):
    no_spaces = s.replace(' ', '')
    if no_spaces.isdigit():
        return no_spaces
    return s

def foo(s):
    ', '.join(map(replace_if_num, s.split(',')))

df['address'] = df['address'].map(foo)

Answer 4

已经给出了好的答案，这里有一个没有lambda或re的替代方案：

# input list
lst = ['2 47, Philiproad, London, uk', '12 4, Northhall, London, uk']

# remove a space if it exists before the first comma in the element of the lst
result = [a if ' ' not in a.split(',')[0] else a.replace(' ','',1) for a in lst]

print(result)

输出：

['247, Philiproad, London, uk', '124, Northhall, London, uk']

删除字符串数字之间的空格

4 个答案: