在“熊猫”列中搜索图案,然后提取图案左侧的值

时间:2019-08-21 08:45:23

标签: python string pandas

我在Pandas数据框中有这样的列(dtype =“ O”):

Column_string 
! 111  PATTERN1   .......,,,,,,.... !444PATTERN2
! 222  PATTERN3   .......,,,,,,.... !555 PATTERN3
! 333  PATTERN4   .......,,,,,,.... !666 PATTERN5

我想在模式的左侧提取一个值,直到'!'。例如,如果我正在寻找PATTERN1,则想要的结果是:111。< / p>

我想基于特定模式创建新列。因此,需要的输出(如果我只在寻找PATTERN1和PATTERN2:

Column_string                                            PATTERN1  PATTERN2
! 111  PATTERN1   .......,,,,,,.... !444PATTERN2         111       444
! 222  PATTERN3   .......,,,,,,.... !555 PATTERN3        none      none 
! 333  PATTERN4   .......,,,,,,.... !666 PATTERN5        none      none 

1 个答案:

答案 0 :(得分:1)

使用str.findall

##sample df
                                      Column_string 
0   ! 111  PATTERN1   .......,,,,,,.... !444PATTERN2
1  ! 222  PATTERN3   .......,,,,,,.... !555 PATTERN3
2  ! 333  PATTERN4   .......,,,,,,.... !666 PATTERN5
3      3434 PATTERN .................... 435 PATTERN

patterns = df.join(pd.DataFrame(df['Column_string '].str.findall('((?<=!)\s*\d+\s*(?=PATTERN))').tolist()).rename({0:'PATTERN1',1:'PATTERN2'},axis=1))
df.join(patterns)

                                      Column_string  PATTERN1 PATTERN2
0   ! 111  PATTERN1   .......,,,,,,.... !444PATTERN2    111       444
1  ! 222  PATTERN3   .......,,,,,,.... !555 PATTERN3    222       555 
2  ! 333  PATTERN4   .......,,,,,,.... !666 PATTERN5    333       666 
3      3434 PATTERN .................... 435 PATTERN    None     None

注意:如果字符串中的PATTERN关键字指向某种排序模式,则下面的方法有效

##extract the number value where pattern1 and pattern2 is present
print(df.join(pd.DataFrame(df['Column_string '].str.findall('((?<=!)\s*\d+\s*(?=PATTERN1|PATTERN2))').tolist()).rename({0:'PATTERN1',1:'PATTERN2'},axis=1)))

                                      Column_string  PATTERN1 PATTERN2
0   ! 111  PATTERN1   .......,,,,,,.... !444PATTERN2    111        444
1  ! 222  PATTERN3   .......,,,,,,.... !555 PATTERN3     None     None
2  ! 333  PATTERN4   .......,,,,,,.... !666 PATTERN5     None     None
3      3434 PATTERN .................... 435 PATTERN     None     None