字符串熊猫中的正则表达式(拆分)

时间:2020-10-28 10:23:31

标签: python regex split

你好,我有一个字符串,例如:

liste_to_split=['NW_011625257.1_0','scaffold1_3','scaffold3']

,我想在Number_Number拆分它们 我尝试过:

for i in liste_to_split:
 i.split(r'(?<=[0-9])_')

我知道了

['NW_011625257.1_0']
['scaffold1_3']
['scaffold3']

代替

['NW_011625257.1'] ['0']
['scaffold1'] ['3']
['scaffold3']

有人知道问题出在哪里吗?

2 个答案:

答案 0 :(得分:1)

您可以使用:

>>> import re
>>> liste_to_split=['NW_011625257.1_0','scaffold1_3','scaffold3']
>>> 
>>> for i in liste_to_split:
...     re.split(r'(?<=[0-9])_', i)
...
['NW_011625257.1', '0']
['scaffold1', '3']
['scaffold3']

请注意使用re.split而不是string.split,并在断言后方使用_来确保我们不会在零宽度匹配上进行分割。


根据OP在下面的评论,看来OP希望对dataframe列进行此拆分。在这种情况下,请使用:

假设这是您的数据框:

>>> print (df)
             column
0  NW_011625257.1_0
1       scaffold1_3
2         scaffold3

然后您可以使用:

>>> print (df['column'].str.split(r'(?<=[0-9])_', expand=True))
                0     1
0  NW_011625257.1     0
1       scaffold1     3
2       scaffold3  None

答案 1 :(得分:1)

l=['NW_011625257.1_0','scaffold1_3','scaffold3']

for i in l:
  f = i.split('_')
  print(f) 

输出

['NW', '011625257.1', '0']
['scaffold1', '3']
['scaffold3']