找到两个或多个连续的单词以upper开头并替换为它们的缩写(regex)

时间:2013-05-23 19:08:15

标签: python regex words

我想要替换两个或多个以高位字符开头的连续单词并用缩写替换它们,我设法找到带有

的单词
def find(name):
        return re.findall('([A-Z][a-z]+(?=\s[A-Z])(?:\s[A-Z][a-z]+)+)', name)

但是当我试图替换我无法做到的话时

这里我得到了什么

import re


def main():
    name = raw_input(" Enter name: ")

    print find(name)


def find(name):
    return re.sub(r'([A-Z][a-z]+(?=\s[A-Z])(?:\s[A-Z][a-z]+)+)', replacement, name)


def replacement(match):
    return match.group[0].upper()

main()

例如

输入:我参加了年度大会。 输出:我去了AGM。

感谢任何帮助

2 个答案:

答案 0 :(得分:1)

描述

这里我使用了两个单独的表达式,第一个表达了所有标题的单词,其中单词中包含2个或更多单词。第二个表达式拉出每个单词的第一个字母......这些表达式使用逻辑拼接在一起以替换源字符串中的值。

(?:^|\s+)((?:\s*\b[A-Z]\w{1,}\b){2,})

enter image description here

\b([A-Z])

enter image description here

实施例

$Regex = '(?:^|\s+)((?:\s*\b[A-Z]\w{1,}\b){2,})'
$String = 'I went to the Annual General Meeting with some guy named Scott Jones on Perl Compatible Regular Expressions. '

Write-Host start with 
write-host $String
Write-Host
Write-Host found
$Matches = @()
([regex]"$Regex").matches($String) | foreach {
    $FoundThis = $_.Groups[1].Value
    write-host "group one $($_.Groups[1].Index) = '$($FoundThis)'"

    [string]$Acronym = ""
    ([regex]"\b([A-Z])").matches($FoundThis) | foreach {
        $Acronym += $_.Groups[1].Value
        } # next match

    $String = $String -replace $FoundThis, $Acronym
    } # next match


Write-Host $String

产量

start with
I went to the Annual General Meeting with some guy named Scott Jones on Perl Compatible Regular Expressions. 

found
group one 14 = 'Annual General Meeting'
group one 57 = 'Scott Jones'
group one 72 = 'Perl Compatible Regular Expressions'
I went to the AGM with some guy named SJ on PCRE. 

声明

  • 是的,我知道OP要求一个python示例,但我对PowerShell更熟悉。逻辑是一样的。
  • 如上所述,这将匹配正确的名称,如果句子的第一个单词恰好是标题,然后是标题的第二个单词。所以你需要自己做错误检查

答案 1 :(得分:1)

如果您按如下方式修改replacement功能,则您的示例应处于正常工作状态:

def replacement(match):
    return ''.join(y[0] for y in m.group(0).split())