我有一个熊猫数据框,如下所示:
> row extract_column
> 0 412952266-desiredtext1»randtext-irrelevant
> 1 512952766-desiredtext1»randtext-irrelevant
> 2 212952766-desiredtext1»randtext-irrelevant
> 3 112953066-desiredtext1»randtext-irrelevant
> 4 712953066-desiredtext1»randtext-irrelevant
> 5 612953366-desiredtext1»randtext-irrelevant
> 6 912953366-desiredtext1»randtext-irrelevant
> 7 412954866-desiredtext1»randtext-irrelevant
> 8 312954966-desiredtext1»randtext-irrelevant
> 9 212954966-desiredtext1»randtext-irrelevant
> 10 612955866-desiredtext1»randtext-irrelevant
> 11 912256266-desiredtext1»randtext-irrelevant
> 12 812256366-desiredtext1»randtext-irrelevant
> 13 512256566-desiredtext1»randtext-irrelevant
> 14 412256566-desiredtext1»randtext-irrelevant
> 15 312256566-desiredtext1»randtext-irrelevant
> 16 212256566-desiredtext1»randtext-irrelevant
> 17 612256566-desiredtext1»randtext-irrelevant
> 18 812956666-desiredtext2»randtext-irrelevant
> 19 912957166-desiredtext2»randtext-irrelevant
> 20 012957866-desiredtext2»randtext-irrelevant
> 21 12952966-desiredtext2»randtext-irrelevant
> 22 2012953066-desiredtext2»randtext-irrelevant
> 23 012953066-desiredtext2»randtext-irrelevant
> 24 312953066-desiredtext2»randtext-irrelevant
> 25 112254166-desiredtext2»randtext-irrelevant
> 26 712254166-desiredtext2»randtext-irrelevant
我想从extract_column中获取所需文本1,所需文本2字段。所需的数据始终后跟»符号,并在前跟9个数字和一个破折号。
答案 0 :(得分:2)
尝试使用extract
df.extract_column.str.extract(r'-([^\.]*)\»', expand=False)
答案 1 :(得分:0)
df.extract_column.str.extract('-(\\w+)')
Out[100]:
0
0 desiredtext1
1 desiredtext1
2 desiredtext1
3 desiredtext1
4 desiredtext1
5 desiredtext1
6 desiredtext1
7 desiredtext1
8 desiredtext1
9 desiredtext1
10 desiredtext1
11 desiredtext1
12 desiredtext1
13 desiredtext1
14 desiredtext1
15 desiredtext1
16 desiredtext1
17 desiredtext1
18 desiredtext2
19 desiredtext2
20 desiredtext2
21 desiredtext2
22 desiredtext2
23 desiredtext2
24 desiredtext2
25 desiredtext2
26 desiredtext2