我正在尝试标准化我的数据框内的地址数据。
我只能在字典中正确替换单词,其中一个单词只有一个值。当一个单词具有多重值时,我并不高贵。
输入数据
data['Postal_Addr']
UNIT 3510 35/F THE CENTRE
HARNEYS THE CENTRE 99 QUEENS STR ROAD CETRAL
OCBC WING HANG BANK UNIT B 5/FLR WING HANG INSURANCE CENTRE BUILDING
M3 CAPITAL PARTNERS 50 LEVEL 3 CENTRAL
GALERIE H LDT 50 CENTRAL 17TH FLOOR
36/FL INFINITUS PLAZA STREET
51 SAI ST STRAT RAOD
词典
Mapping_matrix.set_index('Orginal')['Related_Words'].to_dict()
{'FLOOR ': 'F,FL,FLR',
'ROAD': 'RD, RD.,RAOD',
'STREET': 'ST, STR,STRAT'
'CENTRAL': 'CENTRE, CETRAL'}
预期输出
UNIT 3510 35/FLOOR THE CENTRAL 99 QUEENS ROAD CENTRAL
HARNEYS THE CENTRAL 99 ROAD CENTRAL
OCBC WING HANG BANK UNIT B 5/FLOOR WING HANG INSURANCE CENTRAL BUILDING
M3 CAPITAL PARTNERS 50 LEVEL 3 CENTRAL
GALERIE LDT 50 CENTRAL 17TH FLOOR
51 SAI STREET STREET ROAD
我尝试了一些引用的代码(How to replace a string using a dictionary containing multiple values for a key in python)
def replace_words(d, col):
d1={r'(?<!\S)'+ k.strip() + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
df[col] = df[col].replace(d1, regex=True)
return df[col]
df['col'] = replace_words(d, 'col')
但它没有用。