正则表达式找到不必要的大写单词

时间:2011-12-03 18:31:06

标签: regex

这是我的文字:

TESTING TESTING test test test test test

如果超过50%的句子是大写字母,我希望正则表达式返回true(或匹配)。

在这种情况下,它会返回false,因为只有14个字母的20个是大写字母。

在AppleScript中,我会这样做:

set a to characters of "abcdefghijklmnopqrstuvwxyz"
    set ac to characters of "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    set this_message to characters of "TEST TEST TEST TEST test test test test test test"
    set x to 0 -- Counter
    set y to 1
    repeat with i from 1 to number of items in this_message
        set this_item to item i of this_message
        considering case
            if this_item is not " " then
                if this_item is in ac then
                    set x to x + 1
                end if
            end if
            if this_item is in {" ", ",", ".", "-"} then
                set y to y + 1
            end if
        end considering
    end repeat
    try
        if (round (x / ((count this_message) - y)) * 100) > 50 then
            return true
        else
            return false
        end if
    on error
        return false
    end try

2 个答案:

答案 0 :(得分:2)

这是一个PHP函数,如果一个字符串包含超过一半的CAP,则返回TRUE:

// Test if more than half of string consists of CAPs.
function isMostlyCaps($text) {
    $len = strlen($text);
    if ($len) {  // Check if string has zero length.
        $capscnt = preg_match_all('/[A-Z]/', $text, $matches);
        if ($capscnt/$len > 0.5) return TRUE;
    }
    return FALSE;
}

上述函数将大写字母数量与字符串总长度(包括空格和非字母)进行比较。如果要与非空白字符的数量进行比较,则可以轻松修改该函数:

// Test if more than half of non-whitespace chars in string are CAPs.
function isMostlyCaps($text) {
    $len = preg_match_all('/\S/', $text, $matches);
    if ($len) {  // Check if string has zero length.
        $capscnt = preg_match_all('/[A-Z]/', $text, $matches);
        if ($capscnt/$len > 0.5) return TRUE;
    }
    return FALSE;
}

这是一个考虑整个单词计数的版本:

// Test if more than half of "words" in string are all CAPs.
function isMostlyCapWords($text) {
    // For our purpose a "word" is a sequence of non-whitespace chars.
    $wordcnt = preg_match_all('/\S+/', $text, $matches);
    if ($wordcnt) {  // Check if string has no words.
        $capscnt = preg_match_all('/\b[A-Z]+\b/', $text, $matches);
        if ($capscnt/$wordcnt > 0.5) return TRUE;
    }
    return FALSE;
}

答案 1 :(得分:1)

在perl:

sub mostly_caps {
    my $string = shift;
    my $upper = $string =~ tr/A-Z//;
    my $lower = $string =~ tr/a-z//;
    return $upper >= $lower;
}

对于奖励积分,一个以任意百分比作为参数的版本:

sub caps_pct {
    my ( $string, $pct ) = @_;
    my $upper = $string =~ tr/A-Z//;
    my $lower = $string =~ tr/a-z//;
    return ($upper/($upper+$lower) >= $pct/100;
}

应该很容易将其改编为PHP或任何其他语言。