Question

我有一个单词列表，如：

l = """abca
bcab
aaba
cccc
cbac
babb
"""

我想找到具有相同的第一个和最后一个字符的单词，并且两个中间字符与第一个/最后一个字符不同。

期望的最终结果：

['abca', 'bcab', 'cbac']

我试过了：

re.findall('^(.)..\\1$', l, re.MULTILINE)

但它也会返回所有不需要的单词。我想以某种方式使用[^ ...]，但我无法理解。有一种方法可以使用集合（以过滤上面搜索的结果），但我正在寻找正则表达式。

有可能吗？

Answer 1

有很多方法可以做到这一点。这可能是最简单的：

re.findall(r'''
           \b          #The beginning of a word (a word boundary)
           ([a-z])     #One letter
           (?!\w*\1\B) #The rest of this word may not contain the starting letter except at the end of the word
           [a-z]*      #Any number of other letters
           \1          #The starting letter we captured in step 2
           \b          #The end of the word (another word boundary)
           ''', l, re.IGNORECASE | re.VERBOSE)

如果需要，可以将[a-z]替换为\w，从而略微放宽要求。这将允许数字和下划线以及字母。您还可以将模式中的最后*更改为{2}，将其限制为4个字符的单词。

另请注意，我对Python不是很熟悉，所以我假设您使用findall是正确的。

Answer 2

编辑已修复为使用否定预测断言，而不是使用负 lookbehind 断言。阅读@AlanMoore和@bukzor解释的评论。

>>> [s for s in l.splitlines() if re.search(r'^(.)(?!\1).(?!\1).\1$', s)]
['abca', 'bcab', 'cbac']

该解决方案使用否定前瞻断言，这意味着'只有在没有匹配其他内容时才匹配当前位置。'现在，看一下前瞻断言 - (?!\1)。所有这些意味着'只有在没有第一个字符后才匹配当前字符。'

Answer 3

与正则表达式相悖。

[
    word
    for word in words.split('\n')
    if word[0] == word[-1]
    and word[0] not in word[1:-1]
]

Answer 4

您是否需要使用正则表达式？这是一种更加pythonic的方式来做同样的事情：

l = """abca
bcab
aaba
cccc
cbac
babb
"""

for word in l.split():
  if word[-1] == word[0] and word[0] not in word[1:-1]:
     print word

Answer 5

我将如何做到这一点：

result = re.findall(r"\b([a-z])(?:(?!\1)[a-z]){2}\1\b", subject)

这与贾斯汀的答案类似，除非那个人做了一次性预测，这个人会检查每个字母是否被消耗。

\b
([a-z])  # Capture the first letter.
(?:
  (?!\1)   # Unless it's the same as the first letter...
  [a-z]    # ...consume another letter.
){2}
\1
\b

我不知道您的真实数据是什么样的，因此请随意选择[a-z]，因为它适用于您的示例数据。出于同样的原因，我将长度限制为四个字符。与Justin的回答一样，您可能希望将{2}更改为*，+或其他一些量词。

Answer 6

你可以用负向前瞻或后瞻性断言来做到这一点;有关详细信息，请参阅http://docs.python.org/library/re.html。

Answer 7

不是Python大师，但也许是这个

re.findall('^(.)(?:(?!\1).)*\1$', l, re.MULTILINE)

展开（使用多行修饰符）：

^                # begin of line
  (.)            # capture grp 1, any char except newline
  (?:            # grouping
     (?!\1)         # Lookahead assertion, not what was in capture group 1 (backref to 1)
     .              # this is ok, grab any char except newline
  )*             # end grouping, do 0 or more times (could force length with {2} instead of *)
  \1             # backref to group 1, this character must be the same
$                # end of line

使用正则表达式查找具有相同或不同字符的单词

7 个答案: