我在Pandas数据框中有这样的列(dtype =“ O”):
Column_string
! 111 PATTERN1 .......,,,,,,.... !444PATTERN2
! 222 PATTERN3 .......,,,,,,.... !555 PATTERN3
! 333 PATTERN4 .......,,,,,,.... !666 PATTERN5
我想在模式的左侧提取一个值,直到'!'。例如,如果我正在寻找PATTERN1,则想要的结果是:111。< / p>
我想基于特定模式创建新列。因此,需要的输出(如果我只在寻找PATTERN1和PATTERN2:
Column_string PATTERN1 PATTERN2
! 111 PATTERN1 .......,,,,,,.... !444PATTERN2 111 444
! 222 PATTERN3 .......,,,,,,.... !555 PATTERN3 none none
! 333 PATTERN4 .......,,,,,,.... !666 PATTERN5 none none
答案 0 :(得分:1)
##sample df
Column_string
0 ! 111 PATTERN1 .......,,,,,,.... !444PATTERN2
1 ! 222 PATTERN3 .......,,,,,,.... !555 PATTERN3
2 ! 333 PATTERN4 .......,,,,,,.... !666 PATTERN5
3 3434 PATTERN .................... 435 PATTERN
patterns = df.join(pd.DataFrame(df['Column_string '].str.findall('((?<=!)\s*\d+\s*(?=PATTERN))').tolist()).rename({0:'PATTERN1',1:'PATTERN2'},axis=1))
df.join(patterns)
Column_string PATTERN1 PATTERN2
0 ! 111 PATTERN1 .......,,,,,,.... !444PATTERN2 111 444
1 ! 222 PATTERN3 .......,,,,,,.... !555 PATTERN3 222 555
2 ! 333 PATTERN4 .......,,,,,,.... !666 PATTERN5 333 666
3 3434 PATTERN .................... 435 PATTERN None None
注意:如果字符串中的PATTERN关键字指向某种排序模式,则下面的方法有效
##extract the number value where pattern1 and pattern2 is present
print(df.join(pd.DataFrame(df['Column_string '].str.findall('((?<=!)\s*\d+\s*(?=PATTERN1|PATTERN2))').tolist()).rename({0:'PATTERN1',1:'PATTERN2'},axis=1)))
Column_string PATTERN1 PATTERN2
0 ! 111 PATTERN1 .......,,,,,,.... !444PATTERN2 111 444
1 ! 222 PATTERN3 .......,,,,,,.... !555 PATTERN3 None None
2 ! 333 PATTERN4 .......,,,,,,.... !666 PATTERN5 None None
3 3434 PATTERN .................... 435 PATTERN None None