下面是我的输入数据的外观。我想使用pandas / python / regex将所有以“ Unit”开头的字符串提取到对应于第二列中单词位置的新列中。任何帮助将不胜感激。
Input:
A
MARYLAND
Unit6
Unit7
Unit8
NEW SECTOR
Unit1
Unit2
NORTH SECTOR
Unit1
Unit2
PVT SECTOR
PUBLIC SECTOR
Unit1
Unit2
CENTRAL SECTOR
THERMAL
SOUTH SECTOR
Unit1
Unit2
Unit3
ACCOUNT SECTOR
DOLBY DIGITAL
WASHINGTON
Output:
A B
MARYLAND
Unit6 Unit6
Unit7 Unit7
Unit8 Unit8
NEW SECTOR
Unit1 Unit1
Unit2 Unit2
NORTH SECTOR
Unit1 Unit1
Unit2 Unit2
PVT SECTOR
PUBLIC SECTOR
Unit1 Unit1
Unit2 Unit2
CENTRAL SECTOR
THERMAL
SOUTH SECTOR
Unit1 Unit1
Unit2 Unit2
Unit3 Unit3
ACCOUNT SECTOR
DOLBY DIGITAL
WASHINGTON
最后,现在将“ Unit”字符串复制到新列中,我想从A列中删除这些值:
A B
MARYLAND
Unit6
Unit7
Unit8
NEW SECTOR
Unit1
Unit2
NORTH SECTOR
Unit1
Unit2
PVT SECTOR
PUBLIC SECTOR
Unit1
Unit2
CENTRAL SECTOR
THERMAL
SOUTH SECTOR
Unit1
Unit2
Unit3
ACCOUNT SECTOR
DOLBY DIGITAL
WASHINGTON
答案 0 :(得分:1)
df['B'] = df['A'].str.extract('(^Unit\d+)')
df.loc[df['B'].notnull(),'A'] = ''
df['B'].fillna('',inplace=True)
print(df)
A B
0 MARYLAND
1 Unit6
2 Unit7
3 Unit8
4 NEW SECTOR
5 Unit1
6 Unit2
7 NORTH SECTOR
8 Unit1
9 Unit2
10 PVT SECTOR
11 PUBLIC SECTOR
12 Unit1
13 Unit2
14 CENTRAL SECTOR
15 THERMAL
16 SOUTH SECTOR
17 Unit1
18 Unit2
19 Unit3
20 ACCOUNT SECTOR
21 DOLBY DIGITAL
22 WASHINGTON
答案 1 :(得分:0)
使用A列作为索引数组的另一种方法:
df["B"] = df["A"][df['A'].str.contains('^Unit', regex=True)]
df["B"] = df["B"].fillna("")
A B
0 MARYLAND
1 Unit6 Unit6
2 Unit7 Unit7
3 Unit8 Unit8
4 NEW SECTOR
5 Unit1 Unit1
6 Unit2 Unit2
7 NORTH SECTOR
8 Unit1 Unit1
9 Unit2 Unit2
10 PVT SECTOR
11 PUBLIC SECTOR
12 Unit1 Unit1
13 Unit2 Unit2
14 CENTRAL SECTOR
15 THERMAL
16 SOUTH SECTOR
17 Unit1 Unit1
18 Unit2 Unit2
19 Unit3 Unit3
20 ACCOUNT SECTOR
21 DOLBY DIGITAL
22 WASHINGTON