缩写不止一次被替换:iaw->符合->输入符合

时间:2019-03-25 13:54:48

标签: python pandas dictionary

我正在处理一个数据框,该数据框在文本列中包含很多缩写。使用预定义的词典,我用全词替换了缩写词,并且可以正常工作。

但是缩写似乎已被替换多次。如果替换缩写的完整单词包含另一个缩写,则将再次替换该缩写:

d = {' h ' : ' height ', ' mm ' : ' milimeter ', ' w ' : 'width', ' iaw ' : ' in accordance with ', ' in ' : ' input '}

dt = {"Number":[1, 2], "text": ["measure depth 22 mm h 24 mm w 75 mm", "wheel 4 iaw amm"]}

dataframe = pd.DataFrame(dt) 

def process_data(file_name):
  data = file_name
  data["text"].replace(d, regex=True, inplace=True)
  return data

df = process_data(dataframe)
print(df)

其结果是:

   Number                                                 text
0  1       measure depth 22 milimeter height 24 milimeter w 75 mm
1  2       wheel 4 input accordance with amm  

应为:

   Number                                                 text
0  1       measure depth 22 milimeter height 24 milimeter w 75 mm
1  2       wheel 4 in accordance with amm  

有人知道如何解决这个问题吗?

1 个答案:

答案 0 :(得分:1)

您可以将功能Series.str.replaceregex一起使用:

#removed whitespaces
d = {'h' : 'height', 
     'mm' : 'milimeter', 
     'w' : 'width',
     'iaw' : 'in accordance with',
     'in' : 'input'}


pat = '|'.join(r"\b{}\b".format(x) for x in d.keys())
dataframe['keyword'] = dataframe['text'].str.replace(pat, lambda x: d[x.group()], regex=True)
print (dataframe)

   Number                                 text  \
0       1  measure depth 22 mm h 24 mm w 75 mm   
1       2                      wheel 4 iaw amm   

                                             keyword  
0  measure depth 22 milimeter height 24 milimeter...  
1                     wheel 4 in accordance with amm  

另一种解决方案是用空格分割值,用getjoin的字典映射,再用space返回:

f = lambda x: ' '.join(d.get(y, y) for y in x.split())
dataframe['keyword'] = dataframe['text'].apply(f)
print (dataframe)
   Number                                 text  \
0       1  measure depth 22 mm h 24 mm w 75 mm   
1       2                      wheel 4 iaw amm   

                                             keyword  
0  measure depth 22 milimeter height 24 milimeter...  
1                     wheel 4 in accordance with amm