我想从数据框中名为“链接的项目”的列中提取需求编号的值。此列“链接的项目”包含以下格式的字符串:
Linked Issues
Requirement-12345, NewPr-8795, OldPr-78941
MSR-85749, Requirement-74852, NewPr-95418
Requirement-894895
OldPr-85974, NewPr-968572, Requirement-985785
预期结果:
我想要的是将需求编号存储在如下所示的新列中:
Requirement Number
Requirement-12345
Requirement-74852
Requirement-894895
Requirement-985785
答案 0 :(得分:1)
使用Series.str.extract
获取带有regex
-r'(Requirement-\d+)'
字符串的整数值,以获取每行的第一个匹配值:
df['new'] = df['Linked Issues'].str.extract(r'(Requirement-\d+)')
print (df)
Linked Issues new
0 Requirement-12345, NewPr-8795, OldPr-78941 Requirement-12345
1 MSR-85749, Requirement-74852, NewPr-95418 Requirement-74852
2 Requirement-894895 Requirement-894895
3 OldPr-85974, NewPr-968572, Requirement-985785 Requirement-985785
如果每行可能有多个值,请将Series.str.findall
与Series.str.join
结合使用:
df['new'] = df['Linked Issues'].str.findall(r'(Requirement-\d+)').str.join(', ')