我有一个数据框,其中的列看起来像这样
Other via Other on 17 Jan 2019
Other via Other on 17 Jan 2019
Interview via E-mail on 14 Dec 2018
Rejected via E-mail on 15 Jan 2019
Rejected via E-mail on 15 Jan 2019
Rejected via E-mail on 15 Jan 2019
Rejected via E-mail on 15 Jan 2019
Interview via E-mail on 14 Jan 2019
Rejected via Website on 12 Jan 2019
是否可以将此列分为两部分,一个是在“ via”之前的内容,另一个是在“ on”之后的内容?谢谢!
答案 0 :(得分:0)
使用str.extract
df[['col1', 'col2']] = df.col.str.extract('(.*)\svia.*on\s(.*)', expand = True)
col1 col2
0 Other 17 Jan 2019
1 Other 17 Jan 2019
2 Interview 14 Dec 2018
3 Rejected 15 Jan 2019
4 Rejected 15 Jan 2019
5 Rejected 15 Jan 2019
6 Rejected 15 Jan 2019
7 Interview 14 Jan 2019
8 Rejected 12 Jan 2019
答案 1 :(得分:0)
您几乎可以将split()用作df.col.str.split('via|on',expand=True)[[0,2]
:
让它详细说明........
再现您的数据框:
>>> df
col
0 Other via Other on 17 Jan 2019
1 Other via Other on 17 Jan 2019
2 Interview via E-mail on 14 Dec 2018
3 Rejected via E-mail on 15 Jan 2019
4 Rejected via E-mail on 15 Jan 2019
5 Rejected via E-mail on 15 Jan 2019
6 Rejected via E-mail on 15 Jan 2019
7 Interview via E-mail on 14 Jan 2019
8 Rejected via Website on 12 Jan 2019
让我们看这里首先根据所需的字符串via
和on
拆分整列,这会将整个列col
拆分为三个独立的列0 1 2
其中0
将在字符串via
之前&2
将在字符串on
之后,其余将在中间1
要求。
因此,我们可以自由对待,仅选择列0
和2
如下。
>>> df.col.str.split('via|on',expand=True)[[0,2]]
0 2
0 Other 17 Jan 2019
1 Other 17 Jan 2019
2 Interview 14 Dec 2018
3 Rejected 15 Jan 2019
4 Rejected 15 Jan 2019
5 Rejected 15 Jan 2019
6 Rejected 15 Jan 2019
7 Interview 14 Jan 2019
8 Rejected 12 Jan 2019
更好地分配新的数据框并重命名列:
结果:
newdf = df.col.str.split('via|on',expand=True)[[0,2]]
newdf.rename(columns={0: 'col1', 2: 'col2'}, inplace=True)
print(newdf)
col1 col2
0 Other 17 Jan 2019
1 Other 17 Jan 2019
2 Interview 14 Dec 2018
3 Rejected 15 Jan 2019
4 Rejected 15 Jan 2019
5 Rejected 15 Jan 2019
6 Rejected 15 Jan 2019
7 Interview 14 Jan 2019
8 Rejected 12 Jan 2019