Question

有人可以帮我识别文本文件中的单词吗？大写或小写但没有数字，括号，破折号，标点符号等（无论＆＃34;单词＆＃34;的定义是什么）

我在考虑：

r"\w+ \w+"

但它不起作用

谢谢

Answer 1

您可以使用字符类来指定预期字符的范围：

r'[a-zA-Z]+'

在此处阅读更多http://www.regular-expressions.info/charclass.html

在python中，您可以使用函数re.findall()返回列表中的所有匹配项，或使用re.finditer返回匹配对象的迭代器。

Answer 2

re.findall(r"\b[a-z]+\b",test_str,re.I)

你可以这样做。

Answer 3

import re
text = "hey there 222 how are you ??? fine I hope!"
print re.findall("[a-z]+", subject, re.IGNORECASE)
#['hey', 'there', 'how', 'are', 'you', 'fine', 'I', 'hope']

正则表达式解释

[a-z]+

Options: Case insensitive;

Match a single character in the range between “a” and “z” «[a-z]+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»

Python现场演示

http://ideone.com/JT8ZjD

正则表达式：如何识别屏幕中的单词（或如何排除标点符号和数字）

3 个答案: