import pandas as pd
df= pd.DataFrame({'Date':['nothing ',
'This 1A1619 A124 person BL171111 the A-1-24 and ',
'dont Z112 but NOT 12-24-1981',
'nada here either',
'mix: 1A25629Q88 or A13B ok A1 the A16'],
'IDs': ['A11','B22','C33', 'D44', 'E55'],
})
这是对pulling mixed letters and numbers的后续跟进。使用此代码
pat = r'((?<!\S)(?:[a-zA-Z]+\d|\d+[a-zA-Z])[a-zA-Z0-9]*(?!\S))'
df['Date'].str.extractall(pat)
给我
0
match
1 0 1A1619
1 A124
2 BL171111
2 0 Z112
4 0 1A25629Q88
1 A13B
2 A1
3 A16
我希望在NaN
不匹配的地方添加regex
。所以我想要这个东西
0
match
0 NaN
1 0 1A1619
1 A124
2 BL171111
2 0 Z112
3 NaN
4 0 1A25629Q88
1 A13B
2 A1
3 A16
我该如何更改我的代码?
答案 0 :(得分:1)
鉴于s
是df['Date'].str.extractall(pat)
的返回,我们可以:
i = df.index.difference(s.index.get_level_values(0))
o = pd.DataFrame({0: np.nan}, index=[i, [0]*len(i)])
adjust = lambda s,o: pd.concat([s, o]).sort_index()
然后
>>> adjust(s,o)
0
match
0 0 NaN
1 0 1A1619
1 A124
2 BL171111
2 0 Z112
3 0 NaN
4 0 1A25629Q88
1 A13B
2 A1
3 A16