Question

我很困惑何时同时使用str.findall和str.match。

例如，我有一个df，它有很多行文本，我需要提取日期。

让我们说我想提取检查有作品Mar的行（截至3月的缩写）。

如果我在有匹配的地方广播df

df[df.original.str.match(r'(Mar)')==True]

我得到了以下输出：

204 Mar 10 1976 CPT Code: 90791: No medical servic...
299 March 1974 Primary ...

但是，如果我在str.findall中尝试相同的正则表达式，我什么都没得到：

0      []
1      []
2      []
3      []
4      []
5      []
6      []
7      []
...

495              []
496              []
497              []
498              []
499              []
Name: original, Length: 500, dtype: object

为什么？我确信这是对匹配，查找，查找，提取和提取缺乏理解。

Answer 1

我尝试使用文档来解释这个：

s = pd.Series(["a1a2", "b1", "c1"], index=["A", "B", "C"])
s

输出：

A    a1a2
B      b1
C      c1
dtype: object

我们首先制作这样的系列，然后使用extract,extractall,find,findall

s.str.extract("([ab])(\d)",expand=True)#We could use the extract and give the pat which can be str of regx 
and  only return the first match of the results.

    0   1
A   a   1
B   b   1
C   NaN NaN

s.str.extractall("([ab])(\d)")#return all the detail which me match 

       0    1
match       
A   0   a   1
1   a   2
B   0   b   1

s.str.find("([ab])(\d)")#all the values is -1 cause find can only give the string

s.str.find('a')
A    0
B   -1
C   -1
dtype: int64

s.str.findall("([ab])(\d)")#give a string or regx and return the detail result
A    [(a, 1), (a, 2)]
B            [(b, 1)]
C                  []
dtype: object

大熊猫正则表达：匹配与Findall

1 个答案: