你好,我有一个df,例如
def save_my_data(self, request):
info['my_data'] = None
serializer = serializers.MyModelSerializer(data=info)
# validate the data
print( serializer.is_valid() ) #returns False
print( serializer.errors ) #returns my_data: [This field can't be null]
我想删减最后一个COL1
NW_011625257.1_0
NW_011623521.1_1
NW_011623521.3_1
NW_011623521.4_1
NW_011623521.1
JZSA01007324.1_2
scaffold_1463_2
scaffold_1463
并得到
'_'
到目前为止,我已经尝试过:
COL1 COL2
NW_011625257.1 0
NW_011623521.1 1
NW_011623521.3 1
NW_011623521.4 1
NW_011623521.1 NaN
JZSA01007324.1 2
scaffold_1463 2
scaffold_1463 NaN
相反,我得到了这样的输出:
df[['COL1','COL2']] = df.COL1.str.split(r'_(?!.*_)', expand=True)
这是我要选择的示例
答案 0 :(得分:2)
您可以使用
df[['COL1','COL2']] = df.COL1.str.split(r"(?<=\d)_(?=\d+$)", expand=True)
请参见regex demo
模式详细信息:
(?<=\d)
-当前位置之前必须有一个数字_
-下划线(?=\d+$)
-当前位置的右边必须有1个以上的数字和字符串的结尾。熊猫测试:
df = pd.DataFrame({'COL1': ['NW_011625257.1_0','NW_011623521.1_1','NW_011623521.3_1','NW_011623521.4_1','NW_011623521.1','JZSA01007324.1_2','scaffold_1463_2','scaffold_1463']})
>>> df[['COL2','COL3']] = df.COL1.str.split(r"(?<=\d)_(?=\d+$)", expand=True)
>>> df
COL1 COL2 COL3
0 NW_011625257.1_0 NW_011625257.1 0
1 NW_011623521.1_1 NW_011623521.1 1
2 NW_011623521.3_1 NW_011623521.3 1
3 NW_011623521.4_1 NW_011623521.4 1
4 NW_011623521.1 NW_011623521.1 None
5 JZSA01007324.1_2 JZSA01007324.1 2
6 scaffold_1463_2 scaffold_1463 2
7 scaffold_1463 scaffold_1463 None