获取m和n个字符之间的单词

时间:2015-11-04 12:31:19

标签: python regex

我试图获得所有以大写字母开头的名称,并在字符数介于3和5之间的同一行上以句号结尾

我的文字如下:

 King. Great happinesse

 Rosse. That now Sweno, the Norwayes King,
Craues composition:
Nor would we deigne him buriall of his men,
Till he disbursed, at Saint Colmes ynch,
Ten thousand Dollars, to our generall vse

 King. No more that Thane of Cawdor shall deceiue
Our Bosome interest: Goe pronounce his present death,
And with his former Title greet Macbeth

 Rosse. Ile see it done

 King. What he hath lost, Noble Macbeth hath wonne.

我正在对此link进行测试。我想把所有的话都弄到3到5之间,但没有成功。

2 个答案:

答案 0 :(得分:3)

这会产生您想要的输出吗?

import re

re.findall(r'[A-Z].{2,4}\.', text)

text包含您问题中的文字时,它会生成此输出:

['King.', 'Rosse.', 'King.', 'Rosse.', 'King.']

正则表达式模式匹配首字母大写字母后的任何字符序列。如果需要,你可以收紧它,例如在模式[a-z]中使用[A-Z][a-z]{2,4}\.将匹配大写字符,后跟2到4个小写字符,后跟一个文字点/句点。

如果您不想复制,可以使用一套来摆脱它们:

>>> set(re.findall(r'[A-Z].{2,4}\.', text))
set(['Rosse.', 'King.'])

答案 1 :(得分:0)

你可能有自己想要在这里使用正则表达式的理由,但是Python提供了一组丰富的字符串方法,而且(IMO)使用这些方法更容易理解代码:

matched_words = []
for line in open('text.txt'):
    words = line.split()
    for word in words:
        if word[0].isupper() and word[-1] == '.' and 3 <= len(word)-1 <=5:
            matched_words.append(word)
print matched_words