Question

我有一个数据框，其中有一列称为“说明”。我想浏览此列中的所有文本，并标识那些描述包含至少3位数字的行。

我在这里：

import re 
df['StrDesc'] = df['Description'].str.split()
y=re.findall('[0-9]{3}',str(df['StrDesc'])
print(y)

我将我的文本列转换为字符串。然后，在使用最终的正则表达式之前，是否需要运行for循环来遍历每一行？

我要采用最好的方法吗？

我的错误是“解析时出现意外的EOF。”

Answer 1

使用str.findall，split是不必要的：

y = df['Description'].str.findall('[0-9]{3}')

但是经过一些测试general solution有点复杂：

df = pd.DataFrame({'Description':['354 64 133 5867 4 te345',
                                  'rt34 3tyr 456',
                                  '23 gh346h rt 9404']})

print(df)
               Description
0  354 64 133 5867 4 te345
1            rt34 3tyr 456
2        23 gh346h rt 9404

y = df['Description'].str.findall('(?:(?<!\d)\d{3}(?!\d))')
print (y)
0    [354, 133, 345]
1              [456]
2              [346]
Name: Description, dtype: object

如何在数据框列中找到特定的表达式？

1 个答案: