我的数据框中有一个名为' qualified'的列。 它的值如下:
b.tech
graduate
btech
hsc
degree
12th pass
pharm.d 2nd year
b pharm
pursuing b pharm
ssc
b.pharm
mba
bsc
no
student
pharm.d 3rd year
b.com
bcom
ug
diploma
b tech
我想通过用其他文本替换某些值来使数据保持一致。
例如,
b tech
或b.tech
或bachelors in X
变为Graduate
。或Masters
,M.Com
,Post Graduate
等。ERROR in ./~/cors-anywhere/lib/cors-anywhere.js
Module not found: Error: Cannot resolve module 'net' in /Users/<username>/<project>/node_modules/cors-anywhere/lib
@ ./~/cors-anywhere/lib/cors-anywhere.js 7:10-24
ERROR in ./~/cors-anywhere/lib/cors-anywhere.js
Module not found: Error: Cannot resolve module 'fs' in /<username>/<project>/node_modules/cors-anywhere/lib
@ ./~/cors-anywhere/lib/cors-anywhere.js 20:4-17
。
我如何使用正则表达式?
答案 0 :(得分:2)
你可以这样做:
to_replace = [r'SearchRegEx1', r'SearchRegEx2', ...]
value = [r'ReplaceRegEx1', r'ReplaceRegEx2', ...]
然后
df['col_name'] = df['col_name'].replace(to_replace, value, regex=True)
<强>演示:强>
In [124]: to_replace = [r'btech|b[\.\s]+\w+|bachelors\b.*', r'Masters|M.Com']
...: value = ['Graduate', 'Post Graduate']
...:
In [125]: df['col'] = df['col'].replace(to_replace, value, regex=True)
In [126]: df
Out[126]:
col
0 Graduate
1 graduate
2 Graduate
3 hsc
4 degree
5 12th
6 pharm.d
7 b
8 pursuing
9 ssc
10 Graduate
11 mba
12 bsc
13 no
14 student
15 pharm.d
16 Graduate
17 bcom
18 ug
19 diploma
20 b