提取特殊字符之间的文本

时间:2019-06-27 14:09:24

标签: python pandas

我是regex的新手,我一生都无法弄清楚如何定义这种情况。我在df中有一个包含字符串的列,有些包含它们的某些结尾,我需要提取该结尾。样本df:

样本df

   col1
0  Each Wednesday
1  Each 3rd Thursday [EXP 12/31/2019]
2  Each 1st, 4th Friday
3  Each Tuesday [EXP 6/30/219]
4  Each Monday [EXP 3/31/2019]
5  Each 4th Wednesday

所需的df输出:

   col1                                         col2
0  Each Wednesday                                 -
1  Each 3rd Thursday [EXP 12/31/2019]      EXP 12/31/2019
2  Each 1st, 4th Friday                           -
3  Each Tuesday [EXP 6/30/219]             EXP 6/30/219
4  Each Monday [EXP 3/31/2019]             EXP 3/31/2019
5  Each 4th Wednesday                             -

我想提取[]之间的字符串的所有部分,并放入新列中。堆栈上有很多很棒的正则表达式示例,但是由于我目前还停留在我的特定用例上,因此我需要一些帮助。

任何帮助将不胜感激。谢谢。

1 个答案:

答案 0 :(得分:1)

我们可以在此处使用str.extract来获取方括号之间的所有内容。最后,我们使用fillna用破折号NaN代替-

df['col2'] = df['col1'].str.extract('\[(.*)\]').fillna('-')

                                 col1            col2
0                      Each Wednesday               -
1  Each 3rd Thursday [EXP 12/31/2019]  EXP 12/31/2019
2                Each 1st, 4th Friday               -
3         Each Tuesday [EXP 6/30/219]    EXP 6/30/219
4         Each Monday [EXP 3/31/2019]   EXP 3/31/2019
5                  Each 4th Wednesday               -