从一列中提取以“ Unit”开头的字符串并将其复制到新列中:Pandas

时间:2018-11-28 04:31:57

标签: python regex pandas

下面是我的输入数据的外观。我想使用pandas / python / regex将所有以“ Unit”开头的字符串提取到对应于第二列中单词位置的新列中。任何帮助将不胜感激。

Input:

   A
MARYLAND
Unit6
Unit7
Unit8
NEW SECTOR
Unit1
Unit2
NORTH SECTOR
Unit1
Unit2
PVT SECTOR
PUBLIC SECTOR
Unit1
Unit2
CENTRAL SECTOR
THERMAL
SOUTH SECTOR
Unit1
Unit2
Unit3
ACCOUNT SECTOR
DOLBY DIGITAL
WASHINGTON


Output:

   A              B
MARYLAND            
Unit6           Unit6
Unit7           Unit7
Unit8           Unit8
NEW SECTOR          
Unit1           Unit1
Unit2           Unit2
NORTH SECTOR            
Unit1           Unit1
Unit2           Unit2
PVT SECTOR          
PUBLIC SECTOR           
Unit1           Unit1
Unit2           Unit2
CENTRAL SECTOR          
THERMAL         
SOUTH SECTOR            
Unit1           Unit1
Unit2           Unit2
Unit3           Unit3
ACCOUNT SECTOR          
DOLBY DIGITAL           
WASHINGTON          

最后,现在将“ Unit”字符串复制到新列中,我想从A列中删除这些值:

    A            B
MARYLAND            
                Unit6
                Unit7
                Unit8
NEW SECTOR          
                Unit1
                Unit2
NORTH SECTOR            
                Unit1
                Unit2
PVT SECTOR          
PUBLIC SECTOR           
                Unit1
                Unit2
CENTRAL SECTOR          
THERMAL         
SOUTH SECTOR            
                Unit1
                Unit2
                Unit3
ACCOUNT SECTOR          
DOLBY DIGITAL           
WASHINGTON  

2 个答案:

答案 0 :(得分:1)

使用str.extractfillna

df['B'] = df['A'].str.extract('(^Unit\d+)')
df.loc[df['B'].notnull(),'A'] = ''
df['B'].fillna('',inplace=True)

print(df)
                 A      B
0         MARYLAND       
1                   Unit6
2                   Unit7
3                   Unit8
4       NEW SECTOR       
5                   Unit1
6                   Unit2
7     NORTH SECTOR       
8                   Unit1
9                   Unit2
10      PVT SECTOR       
11   PUBLIC SECTOR       
12                  Unit1
13                  Unit2
14  CENTRAL SECTOR       
15         THERMAL       
16    SOUTH SECTOR       
17                  Unit1
18                  Unit2
19                  Unit3
20  ACCOUNT SECTOR       
21   DOLBY DIGITAL       
22      WASHINGTON       

答案 1 :(得分:0)

使用A列作为索引数组的另一种方法:

df["B"] = df["A"][df['A'].str.contains('^Unit', regex=True)]
df["B"] = df["B"].fillna("")

    A        B
0   MARYLAND    
1   Unit6    Unit6
2   Unit7    Unit7
3   Unit8    Unit8
4   NEW SECTOR  
5   Unit1    Unit1
6   Unit2    Unit2
7   NORTH SECTOR    
8   Unit1    Unit1
9   Unit2    Unit2
10  PVT SECTOR  
11  PUBLIC SECTOR   
12  Unit1    Unit1
13  Unit2    Unit2
14  CENTRAL SECTOR  
15  THERMAL 
16  SOUTH SECTOR    
17  Unit1    Unit1
18  Unit2    Unit2
19  Unit3    Unit3
20  ACCOUNT SECTOR  
21  DOLBY DIGITAL   
22  WASHINGTON