通过正则表达式更改熊猫列的内容

时间:2019-01-18 21:38:03

标签: python pandas dataframe split

我有一个数据框,其中的列看起来像这样

Other via Other on 17 Jan   2019 
Other via Other on 17 Jan   2019 
Interview via E-mail    on  14  Dec 2018
Rejected via    E-mail  on  15  Jan 2019
Rejected via    E-mail  on  15  Jan 2019
Rejected via    E-mail  on  15  Jan 2019
Rejected via    E-mail  on  15  Jan 2019
Interview via   E-mail  on  14  Jan 2019
Rejected via Website on 12 Jan  2019

是否可以将此列分为两部分,一个是在“ via”之前的内容,另一个是在“ on”之后的内容?谢谢!

2 个答案:

答案 0 :(得分:0)

使用str.extract

df[['col1', 'col2']] = df.col.str.extract('(.*)\svia.*on\s(.*)', expand = True)

    col1        col2
0   Other       17 Jan 2019
1   Other       17 Jan 2019
2   Interview   14 Dec 2018
3   Rejected    15 Jan 2019
4   Rejected    15 Jan 2019
5   Rejected    15 Jan 2019
6   Rejected    15 Jan 2019
7   Interview   14 Jan 2019
8   Rejected    12 Jan 2019

答案 1 :(得分:0)

您几乎可以将split()用作df.col.str.split('via|on',expand=True)[[0,2]

让它详细说明........

再现您的数据框:

>>> df
                                        col
0         Other via Other on 17 Jan   2019
1         Other via Other on 17 Jan   2019
2  Interview via E-mail    on  14  Dec 2018
3  Rejected via    E-mail  on  15  Jan 2019
4  Rejected via    E-mail  on  15  Jan 2019
5  Rejected via    E-mail  on  15  Jan 2019
6  Rejected via    E-mail  on  15  Jan 2019
7  Interview via   E-mail  on  14  Jan 2019
8      Rejected via Website on 12 Jan  2019

让我们看这里首先根据所需的字符串viaon拆分整列,这会将整个列col拆分为三个独立的列0 1 2其中0将在字符串via之前&2将在字符串on之后,其余将在中间1要求。

因此,我们可以自由对待,仅选择列02如下。

>>> df.col.str.split('via|on',expand=True)[[0,2]]
            0                2
0      Other    17 Jan   2019
1      Other    17 Jan   2019
2  Interview      14  Dec 2018
3   Rejected      15  Jan 2019
4   Rejected      15  Jan 2019
5   Rejected      15  Jan 2019
6   Rejected      15  Jan 2019
7  Interview      14  Jan 2019
8   Rejected      12 Jan  2019

更好地分配新的数据框并重命名列:

结果:

newdf = df.col.str.split('via|on',expand=True)[[0,2]]
newdf.rename(columns={0: 'col1', 2: 'col2'}, inplace=True)
print(newdf)

         col1             col2
0      Other      17 Jan   2019
1      Other      17 Jan   2019
2  Interview      14  Dec 2018
3   Rejected      15  Jan 2019
4   Rejected      15  Jan 2019
5   Rejected      15  Jan 2019
6   Rejected      15  Jan 2019
7  Interview      14  Jan 2019
8   Rejected      12 Jan  2019