Question

我对python比较陌生。我有一列带有数字和文字的数据，例如

import pandas as pd
mycolumn=pd.Series(["I w0n 1200$ in poker and got 1050$ on my b111rthday",
                       "another month was b4d, I only earned 150$",
                       "d4d gave 2200, lost 0420$ in poker in 10570 Berlin"])

我想取出1000以上的所有值，以便：

result=pd.Series([[1200,1050],[],[2200,10570]])

结果不一定是熊猫系列。任何其他格式，我以后可以使用空单元格或smth来对我的其余数据进行子集化。

Answer 1

将str.findall与(\d{4,})一起使用，其中4表示至少4个位置，即＆gt; 1000

In [876]: mycolumn.str.findall('(\d{4,})')
Out[876]:
0           [1200, 1050]
1                     []
2    [2200, 0420, 56454]
dtype: object

并且，如pointed，如果您不想要以0开头的数字，请使用

In [877]: mycolumn.str.findall('([1-9]\d{3,})')
Out[877]:
0     [1200, 1050]
1               []
2    [2200, 56454]
dtype: object

详细

In [878]: mycolumn
Out[878]:
0    I w0n 1200$ in poker and got 1050$ on my b111r...
1            another month was b4d, I only earned 150$
2        d4d gave 2200, lost 0420$ with 56454 in poker
dtype: object

从字符串向量中提取数字向量

1 个答案: