我有dataframe列。我只需保留6位数字,所有其他数字应命名为“无效”
输入
data['Post_Code']
629785
588778
760-\63
76063
76063
S4P2Z6
NP443HO
999999999
8
4
3
3
460803
460803
460803
760439
569139
ABVCD
预期输出
data['Cleaned_Post_Code']
629785
588778
Nil
Nil
Nil
Nil
Nil
Nil
Nil
Nil
Nil
Nil
460803
460803
460803
760439
569139
Nil
如何做到这一点。
答案 0 :(得分:2)
您可以使用正则表达式this solution - 字符串开头^
,\d{6}
使用6 digit
,字符串结尾使用$
:
data['Cleaned_Post_Code'] = data['Post_Code'].str.extract('^(\d{6})$', expand=False)
print (data)
Post_Code Cleaned_Post_Code
0 629785 629785
1 588778 588778
2 760-3 NaN
3 76063 NaN
4 76063 NaN
5 S4P2Z6 NaN
6 NP443HO NaN
7 999999999 NaN
8 8 NaN
9 4 NaN
10 3 NaN
11 3 NaN
12 460803 460803
13 460803 460803
14 460803 460803
15 760439 760439
16 569139 569139
17 ABVCD NaN
如果要替换NaN
,请添加str.extract
:
data['Cleaned_Post_Code'] = (data['Post_Code'].str.extract('^(\d{6})$', expand=False)
.fillna('Nill'))
print (data)
Post_Code Cleaned_Post_Code
0 629785 629785
1 588778 588778
2 760-3 Nill
3 76063 Nill
4 76063 Nill
5 S4P2Z6 Nill
6 NP443HO Nill
7 999999999 Nill
8 8 Nill
9 4 Nill
10 3 Nill
11 3 Nill
12 460803 460803
13 460803 460803
14 460803 460803
15 760439 760439
16 569139 569139
17 ABVCD Nill
答案 1 :(得分:-1)
data['Cleaned_Post_Code'] = np.where((data['Post_Code'].str.len()==6)\
&(data['Post_Code'].str.isdigit()),
data['Post_Code'], 'Nil')