如何在数据框列中过滤并保留仅6位数字

时间:2018-01-26 07:37:16

标签: python regex pandas

我有dataframe列。我只需保留6位数字,所有其他数字应命名为“无效”

输入

data['Post_Code'] 
629785
588778
760-\63
76063
76063
S4P2Z6
NP443HO
999999999
8
4
3
3
460803
460803
460803
760439
569139
ABVCD

预期输出

data['Cleaned_Post_Code']
629785
588778
Nil
Nil
Nil
Nil
Nil
Nil
Nil
Nil
Nil
Nil
460803
460803
460803
760439
569139
Nil

如何做到这一点。

2 个答案:

答案 0 :(得分:2)

您可以使用正则表达式this solution - 字符串开头^\d{6}使用6 digit,字符串结尾使用$

data['Cleaned_Post_Code'] = data['Post_Code'].str.extract('^(\d{6})$', expand=False)
print (data)
    Post_Code Cleaned_Post_Code
0      629785            629785
1      588778            588778
2       760-3               NaN
3       76063               NaN
4       76063               NaN
5      S4P2Z6               NaN
6     NP443HO               NaN
7   999999999               NaN
8           8               NaN
9           4               NaN
10          3               NaN
11          3               NaN
12     460803            460803
13     460803            460803
14     460803            460803
15     760439            760439
16     569139            569139
17      ABVCD               NaN

如果要替换NaN,请添加str.extract

data['Cleaned_Post_Code'] = (data['Post_Code'].str.extract('^(\d{6})$', expand=False)
                                              .fillna('Nill'))
print (data)
    Post_Code Cleaned_Post_Code
0      629785            629785
1      588778            588778
2       760-3              Nill
3       76063              Nill
4       76063              Nill
5      S4P2Z6              Nill
6     NP443HO              Nill
7   999999999              Nill
8           8              Nill
9           4              Nill
10          3              Nill
11          3              Nill
12     460803            460803
13     460803            460803
14     460803            460803
15     760439            760439
16     569139            569139
17      ABVCD              Nill

答案 1 :(得分:-1)

data['Cleaned_Post_Code'] = np.where((data['Post_Code'].str.len()==6)\
                                    &(data['Post_Code'].str.isdigit()), 
                                     data['Post_Code'], 'Nil')