熊猫:使用正则表达式替换整个单元格的文本

时间:2017-05-10 11:38:34

标签: python pandas

我的数据框中有一个名为' qualified'的列。 它的值如下:

b.tech                           
graduate                         
btech                             
hsc                               
degree                            
12th pass                         
pharm.d 2nd year                  
b pharm                           
pursuing b pharm                  
ssc                               
b.pharm                           
mba                               
bsc                               
no                                
student                           
pharm.d 3rd year                  
b.com                             
bcom                              
ug                                
diploma                           
b tech                            

我想通过用其他文本替换某些值来使数据保持一致。 例如, b techb.techbachelors in X变为Graduate。或MastersM.ComPost Graduate等。ERROR in ./~/cors-anywhere/lib/cors-anywhere.js Module not found: Error: Cannot resolve module 'net' in /Users/<username>/<project>/node_modules/cors-anywhere/lib @ ./~/cors-anywhere/lib/cors-anywhere.js 7:10-24 ERROR in ./~/cors-anywhere/lib/cors-anywhere.js Module not found: Error: Cannot resolve module 'fs' in /<username>/<project>/node_modules/cors-anywhere/lib @ ./~/cors-anywhere/lib/cors-anywhere.js 20:4-17 。 我如何使用正则表达式?

1 个答案:

答案 0 :(得分:2)

你可以这样做:

to_replace = [r'SearchRegEx1', r'SearchRegEx2', ...]
value = [r'ReplaceRegEx1', r'ReplaceRegEx2', ...]

然后

df['col_name'] = df['col_name'].replace(to_replace, value, regex=True)

<强>演示:

In [124]: to_replace = [r'btech|b[\.\s]+\w+|bachelors\b.*', r'Masters|M.Com']
     ...: value = ['Graduate', 'Post Graduate']
     ...:

In [125]: df['col'] = df['col'].replace(to_replace, value, regex=True)

In [126]: df
Out[126]:
         col
0   Graduate
1   graduate
2   Graduate
3        hsc
4     degree
5       12th
6    pharm.d
7          b
8   pursuing
9        ssc
10  Graduate
11       mba
12       bsc
13        no
14   student
15   pharm.d
16  Graduate
17      bcom
18        ug
19   diploma
20         b