Question

输入数据：

                        name  Age Zodiac Grade            City  pahun
0                   /extract   30  Aries     A            Aura  a_b_c
1  /abc/236466/touchbar.html   20    Leo    AB      Somerville  c_d_e
2                    Brenda4   25  Virgo     B  Hendersonville    f_g
3     /abc/256476/mouse.html   18  Libra    AA          Gannon  h_i_j

我正在尝试基于名称列上的正则表达式提取行。此正则表达式提取长度为6的数字。

For example:
/abc/236466/touchbar.html  - 236466

这是我使用的代码

df=df[df['name'].str.match(r'\d{6}') == True]

上面的行根本不匹配。

预期：

                         name  Age Zodiac Grade            City  pahun
0  /abc/236466/touchbar.html   20    Leo    AB      Somerville  c_d_e
1     /abc/256476/mouse.html   18  Libra    AA          Gannon  h_i_j

谁能告诉我我在哪里做错了？

Answer 1

str.match仅在字符串开头搜索匹配项。

将str.contains与正则表达式一起使用

df=df[df['name'].str.contains(r'/\d{6}/')]

查找包含/ + 6位数字+ /的条目。

或者，确保您只匹配6位数字块，而不是7位以上数字块：

df=df[df['name'].str.contains(r'(?<!\d)\d{6}(?!\d)')]

其中

(?<!\d)-确保左侧没有数字
\d{6}-任意六位数字
(?!\d)-右边不允许输入数字。

Answer 2

您快到了，请改用str.contains：

df[df['name'].str.contains(r'\d{6,}')]

使用熊猫字符串匹配的正则表达式

2 个答案: