python正则表达式匹配" ab"或" ba"话

时间:2016-03-27 08:45:15

标签: python regex

我尝试过匹配的字样,包括字母" ab"或" ba"例如" ab" olition,f" ab" rics,pro" ba" ble。我想出了以下正则表达式:

r"[Aa](?=[Bb])[Bb]|[Bb](?=[Aa])[Aa]"

但它包括以",(,)开头或结尾的单词,/ ....非字母数字字符。我怎么擦除它?我只是想匹配单词列表。

import sys
import re

word=[]

dict={}

f = open('C:/Python27/brown_half.txt', 'rU')
w = open('C:/Python27/brown_halfout.txt', 'w')

data = f.read()
word = data.split() # word is list

f.close()

for num2 in word:
    match2 = re.findall("\w*(ab|ba)\w*", num2)
    if match2:
        dict[num2] = (dict[num2] + 1) if num2 in dict.keys() else 1

for key2 in sorted(dict.iterkeys()):print "%s: %s" % (key2, dict[key2])
print len(dict.keys())

在这里,我不知道如何将它与" re.compile ~~"混合起来。第一条评论说的方法......

5 个答案:

答案 0 :(得分:2)

将所有单词与ab或ba匹配(不区分大小写):

import re

text = 'fabh, obar! (Abtt) yybA, kk'
pattern = re.compile(r"(\w*(ab|ba)\w*)", re.IGNORECASE)

# to print all the matches
for match in pattern.finditer(text):
  print match.group(0)

# to print the first match
print pattern.search(text).group(0)

https://regex101.com/r/uH3xM9/1

答案 1 :(得分:1)

在这种情况下,正则表达式不是工作的最佳工具。对于这种简单的情况,它们会使东西过于复杂。您可以使用Python的内置in运算符(适用于Python 2和3)...

sentence = "There are no probable situations whereby that may happen, or so it seems since the Abolition."
words = [''.join(filter(lambda x: x.isalpha(), token)) for token in sentence.split()]

for word in words:
    word = word.lower()
    if 'ab' in word or 'ba' in word:
        print('Word "{}" matches pattern!'.format(word))

如您所见,'ab' in word评估为True,如果在'ab'word中找到字符False原样(即确切地) }} 除此以外。例如'ba' in 'probable' == True'ab' in 'Abolition' == False。第二行是将句子分成单词并取出任何标点符号。 word = word.lower()在比较前使word小写,以便word = 'Abolition''ab' in word == True

答案 2 :(得分:1)

我会这样做:

  1. 使用以下两个字符从不需要的字符中删除字符串 技巧,您的选择:

    a - 通过构建翻译词典并使用translate方法:

    >>> import string
    >>> del_punc = dict.fromkeys(ord(c) for c in string.punctuation)
    s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
    >>> s = s.translate(del_punc)
    >>> print(s)
    'abolition fabrics probable test case bank halfback 1ablution'
    

    b - 使用re.sub方法:

    >>> import string
    >>> import re
    >>> s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
    >>> s = re.sub(r'[%s]'%string.punctuation, '', s)
    >>> print(s)
    'abolition fabrics probable test case bank halfback 1ablution'
    
  2. 接下来将会找到包含' ab'或者' ba':

    a - 拆分空格并找到所需字符串的出现位置,这是我建议您使用的字符:

    >>> [x for x in s.split() if 'ab' in x.lower() or 'ba' in x.lower()]
    ['abolition', 'fabrics', 'probable', 'bank', 'halfback', '1ablution']
    

    b - 使用re.finditer方法:

    >>> pat
    re.compile('\\b.*?(ab|ba).*?\\b', re.IGNORECASE)
    >>> for m in pat.finditer(s):
            print(m.group())
    
    
    abolition
    fabrics
    probable
    test case bank
    halfback
    1ablution
    

答案 3 :(得分:0)

string = "your string here"
lowercase = string.lower()
if 'ab' in lowercase or 'ba' in lowercase:
    print(true)
else:
    print(false)

答案 4 :(得分:0)

试试这个

[(),/]*([a-z]|(ba|ab))+[(),/]*