我是regex的新手,我一生都无法弄清楚如何定义这种情况。我在df中有一个包含字符串的列,有些包含它们的某些结尾,我需要提取该结尾。样本df:
样本df
col1
0 Each Wednesday
1 Each 3rd Thursday [EXP 12/31/2019]
2 Each 1st, 4th Friday
3 Each Tuesday [EXP 6/30/219]
4 Each Monday [EXP 3/31/2019]
5 Each 4th Wednesday
所需的df输出:
col1 col2
0 Each Wednesday -
1 Each 3rd Thursday [EXP 12/31/2019] EXP 12/31/2019
2 Each 1st, 4th Friday -
3 Each Tuesday [EXP 6/30/219] EXP 6/30/219
4 Each Monday [EXP 3/31/2019] EXP 3/31/2019
5 Each 4th Wednesday -
我想提取[]之间的字符串的所有部分,并放入新列中。堆栈上有很多很棒的正则表达式示例,但是由于我目前还停留在我的特定用例上,因此我需要一些帮助。
任何帮助将不胜感激。谢谢。
答案 0 :(得分:1)
我们可以在此处使用str.extract
来获取方括号之间的所有内容。最后,我们使用fillna
用破折号NaN
代替-
:
df['col2'] = df['col1'].str.extract('\[(.*)\]').fillna('-')
col1 col2
0 Each Wednesday -
1 Each 3rd Thursday [EXP 12/31/2019] EXP 12/31/2019
2 Each 1st, 4th Friday -
3 Each Tuesday [EXP 6/30/219] EXP 6/30/219
4 Each Monday [EXP 3/31/2019] EXP 3/31/2019
5 Each 4th Wednesday -