Question

示例代码：

In [1]: import pandas as pd

In [2]: serie = pd.Series(['this#is#a#test', 'another#test'])

In [3]: serie.str.split('#', expand=True)
Out[3]:
         0     1     2     3
0     this    is     a  test
1  another  test  None  None

是否可以拆分而不剥离拆分条件字符串？上面的输出将是：

Out[3]:
         0     1     2     3
0     this   #is    #a #test
1  another #test  None  None

编辑1 ：实际用例是保持匹配模式，例如：

serie.str.split(r'\n\*\*\* [A-Z]+', expand=True)

在我的情况下，[A-Z] +是处理步骤，我想保留这些步骤以进行进一步处理。

Answer 1

您可以通过正面看待来分裂。因此，分割点将是前瞻性表达之前的点。

import pandas as pd

serie = pd.Series(['this#is#a#test', 'another#test'])
print(serie.str.split('(?=#)', expand=True))

输出

         0      1     2      3
0     this    #is    #a  #test
1  another  #test  None   None

Answer 2

尝试str.split('(#[a-z]+)', expand=True)

例如：

serie = pd.Series(['this#is#a#test', 'another#test'])
print(serie.str.split('(#[a-z]+)', expand=True)

Answer 3

只需将其添加到每一行：

In [1]: import pandas as pd

In [2]: serie = pd.Series(['this#is#a#test', 'another#test'])

In [3]: serie.str.split('#', expand=True) + '#'
Out[3]:
          0      1    2      3
0     this#    is#   a#  test#
1  another#  test#  NaN    NaN

In [4]: '#' + serie.str.split('#', expand=True)
Out[4]:
          0      1    2      3
0     #this    #is   #a  #test
1  #another  #test  NaN    NaN

熊猫str.split不剥离分裂模式

3 个答案: