R:使用regmatches提取某些字符

时间:2016-02-21 21:31:12

标签: regex r

我使用regmatches只从代码中提取大写字母,但""替换所有小写字母和数字。有没有办法只提取大写字母而没有""?

static function tidy_links_cb($m)
{
    $blocks = 'div|ul|li|dl|form|fieldset|mena|nav|table|tr|td|th|address|article|aside|blockquote|dir|div|dl|fieldset|footer|form|h1|h2|h3|h4|h5|h6|header|hr|menu|nav|ol|p|pre|section|table|ul';
    if (preg_match('~<('.$blocks.')~is', $m[1])) {
         // THIS LINK CONTAINS BLOCK ELEMENT
        return '<alink'.$m[1].'</alink>';
    }
    return $m[0];
}

static function Tidy($html)
{
    $config = array(
        'wrap' => 0,
        'show-body-only' => true,
        'enclose-text' => true,
        'output-xhtml' => true,
        'doctype' => 'omit',
        'bare' => true,
        'char-encoding' => 'raw',
        'input-encoding' => 'raw',
        'output-encoding' => 'raw',
        'quiet' => true,
        'hide-comments' => true,
        'new-blocklevel-tags' => 'section alink',
        'new-inline-tags' => 'button',
        'drop-empty-elements' => false,
    );

   // you cannot simply replace all <A> tags because Tidy would mess up the inline ones
    $html = preg_replace_callback('~<a(.*)</a>~isU', array(self, 'tidy_links_cb'), $html);

    $html = tidy_parse_string($html, $config);
    tidy_clean_repair($html);

    $html = str_replace('<alink ', '<a ', $html);
    $html = str_replace('alink>', 'a>', $html);

    $html = (string) $html;
    return $html;
}

2 个答案:

答案 0 :(得分:2)

gsub("[^A-Z]", "", code)
# [1] "CONGRATULATIONSYOUAREASUPERNERD"

答案 1 :(得分:2)

[^A-Z]很好,但[^[:upper:]]稍好一点,因为它不会在特殊的语言环境中搞砸。

gsub("[^[:upper:]]", "", code)

为了更好的可读性(但是这个例子可能有点过分)你可能想要stringr::str_extract,但我不太清楚如何干净利落地做到这一点:

library(stringr)
str_c(str_extract_all(code,"[[:Lu:]]+")[[1]],collapse="")