Question

我尝试过匹配的字样，包括字母＆＃34; ab＆＃34;或＆＃34; ba＆＃34;例如＆＃34; ab＆＃34; olition，f＆＃34; ab＆＃34; rics，pro＆＃34; ba＆＃34; ble。我想出了以下正则表达式：

r"[Aa](?=[Bb])[Bb]|[Bb](?=[Aa])[Aa]"

但它包括以＆＃34;，（，）开头或结尾的单词，/ ....非字母数字字符。我怎么擦除它？我只是想匹配单词列表。

import sys
import re

word=[]

dict={}

f = open('C:/Python27/brown_half.txt', 'rU')
w = open('C:/Python27/brown_halfout.txt', 'w')

data = f.read()
word = data.split() # word is list

f.close()

for num2 in word:
    match2 = re.findall("\w*(ab|ba)\w*", num2)
    if match2:
        dict[num2] = (dict[num2] + 1) if num2 in dict.keys() else 1

for key2 in sorted(dict.iterkeys()):print "%s: %s" % (key2, dict[key2])
print len(dict.keys())

在这里，我不知道如何将它与＆＃34; re.compile ~~＆＃34;混合起来。第一条评论说的方法......

Answer 1

将所有单词与ab或ba匹配（不区分大小写）：

import re

text = 'fabh, obar! (Abtt) yybA, kk'
pattern = re.compile(r"(\w*(ab|ba)\w*)", re.IGNORECASE)

# to print all the matches
for match in pattern.finditer(text):
  print match.group(0)

# to print the first match
print pattern.search(text).group(0)

https://regex101.com/r/uH3xM9/1

Answer 2

在这种情况下，正则表达式不是工作的最佳工具。对于这种简单的情况，它们会使东西过于复杂。您可以使用Python的内置in运算符（适用于Python 2和3）...

sentence = "There are no probable situations whereby that may happen, or so it seems since the Abolition."
words = [''.join(filter(lambda x: x.isalpha(), token)) for token in sentence.split()]

for word in words:
    word = word.lower()
    if 'ab' in word or 'ba' in word:
        print('Word "{}" matches pattern!'.format(word))

如您所见，'ab' in word评估为True，如果在'ab'或word中找到字符False原样（即确切地） }} 除此以外。例如'ba' in 'probable' == True和'ab' in 'Abolition' == False。第二行是将句子分成单词并取出任何标点符号。 word = word.lower()在比较前使word小写，以便word = 'Abolition'，'ab' in word == True。

Answer 3

我会这样做：

使用以下两个字符从不需要的字符中删除字符串技巧，您的选择：

a - 通过构建翻译词典并使用translate方法：

>>> import string
>>> del_punc = dict.fromkeys(ord(c) for c in string.punctuation)
s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
>>> s = s.translate(del_punc)
>>> print(s)
'abolition fabrics probable test case bank halfback 1ablution'

b - 使用re.sub方法：

>>> import string
>>> import re
>>> s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
>>> s = re.sub(r'[%s]'%string.punctuation, '', s)
>>> print(s)
'abolition fabrics probable test case bank halfback 1ablution'

接下来将会找到包含＆＃39; ab＆＃39;或者＆＃39; ba＆＃39;：

a - 拆分空格并找到所需字符串的出现位置，这是我建议您使用的字符：

>>> [x for x in s.split() if 'ab' in x.lower() or 'ba' in x.lower()]
['abolition', 'fabrics', 'probable', 'bank', 'halfback', '1ablution']

b - 使用re.finditer方法：

>>> pat
re.compile('\\b.*?(ab|ba).*?\\b', re.IGNORECASE)
>>> for m in pat.finditer(s):
        print(m.group())


abolition
fabrics
probable
test case bank
halfback
1ablution

Answer 4

string = "your string here"
lowercase = string.lower()
if 'ab' in lowercase or 'ba' in lowercase:
    print(true)
else:
    print(false)

Answer 5

试试这个

[(),/]*([a-z]|(ba|ab))+[(),/]*

python正则表达式匹配＆＃34; ab＆＃34;或＆＃34; ba＆＃34;话

5 个答案: