Question

我有一个小标题tib，如下所示：

  A     B     C     D    
  <chr> <chr> <chr> <chr>
1 X123  X456  K234  V333 
2 X456  Z000  L888  B323 
3 X789  ZZZZ  D345  O999 
4 M111  M111  M111  M111 
.
.
.
(5000 rows)

我还有另一个向量，如下所示：

> vec <- c("X123","X456")
> vec
[1] "X123" "X456"

我正在寻找一种搜索方式，并根据是否是TRUE或FALSE在小标题的右侧添加逻辑列（例如，具有5000行） tib中列的任何值都包含vec中的值。我的目标输出如下：

  A     B     C     D      lgl
<chr> <chr> <chr> <chr>  <lgl>
1 X123  X456  K234  V333   TRUE
2 X456  Z000  L888  B323   TRUE
3 X789  ZZZZ  D345  O999   FALSE
4 M111  M111  M111  M111   FALSE

我有以下内容：

> tib %>% 
+   pmap_lgl(~any(..1 %in% vec))
[1]  TRUE  TRUE FALSE FALSE

这获得了我想要的结果，但是我对syntax感到有些困惑。

为什么上述工作有效（即使用..1）而不是必须使用..1，..2，..3和..4？我的理解是pmap根据行的输入生成一个向量，因此我假设上面的..1表示行＃1的向量c("X123","X456","K234","V333")，行的c("X456","Z000","L888","B323") ＃2等

最后，我有两个问题：

如何将这个新逻辑向量附加到上面的tib上？我没有运气：

tib %>% mutate(lgl = pmap_lgl(~any(..1 %in% vec)))

Error in mutate_impl(.data, dots): Evaluation error: argument ".f" is missing, with no default.

如果我要观察访问每一行中的每一列（例如pmap中第一行的“ X123”），该如何使用purrr的语法来做到这一点？

Answer 1

保持简单，您可以将基本功能any与df$lgl <- apply(df, 1, function(x) any(x %in% vec))函数一起使用：

// PREG_FIND_RECURSIVE   - go into subdirectorys looking for more files
// PREG_FIND_DIRMATCH    - return directorys that match the pattern also
// PREG_FIND_DIRONLY     - return only directorys that match the pattern (no files)
// PREG_FIND_FULLPATH    - search for the pattern in the full path (dir+file)
// PREG_FIND_NEGATE      - return files that don't match the pattern
// PREG_FIND_RETURNASSOC - Instead of just returning a plain array of matches,
//                         return an associative array with file stats
// to use more than one simply seperate them with a | character

define('PREG_FIND_RECURSIVE', 1);
define('PREG_FIND_DIRMATCH', 2);
define('PREG_FIND_FULLPATH', 4);
define('PREG_FIND_NEGATE', 8);
define('PREG_FIND_DIRONLY', 16);
define('PREG_FIND_RETURNASSOC', 32);

function preg_find($pattern, $start_dir='.', $args=NULL)
{
  $files_matched = array();
  $fh = @opendir($start_dir);
  if($fh)
  {
    while (($file = readdir($fh)) !== false)
      {
        if (strcmp($file, '.')==0 || strcmp($file, '..')==0) continue;
        $filepath = $start_dir . '/' . $file;
      if (preg_match($pattern, ($args & PREG_FIND_FULLPATH) ? $filepath : $file))
        {
          $doadd =     is_file($filepath)
                   || (is_dir($filepath) && ($args & PREG_FIND_DIRMATCH))
                   || (is_dir($filepath) && ($args & PREG_FIND_DIRONLY));
          if ($args & PREG_FIND_DIRONLY && $doadd && !is_dir($filepath)) $doadd = false;
          if ($args & PREG_FIND_NEGATE) $doadd = !$doadd;
        if ($doadd)
          {
            if ($args & PREG_FIND_RETURNASSOC) // return more than just the filenames
            {
              $fileres = array();
            if (function_exists('stat'))
              {
                $fileres['stat'] = stat($filepath);
                $fileres['du'] = $fileres['stat']['blocks'] * 512;
              }
              //if (function_exists('fileowner')) $fileres['uid'] = fileowner($filepath);
              //if (function_exists('filegroup')) $fileres['gid'] = filegroup($filepath);
              //if (function_exists('filetype')) $fileres['filetype'] = filetype($filepath);
              //if (function_exists('mime_content_type')) $fileres['mimetype'] = mime_content_type($filepath);
              if (function_exists('dirname')) $fileres['dirname'] = dirname($filepath);
              if (function_exists('basename')) $fileres['basename'] = basename($filepath);
              //if (isset($fileres['uid']) && function_exists('posix_getpwuid ')) $fileres['owner'] = posix_getpwuid ($fileres['uid']);
              $files_matched[$filepath] = $fileres;
          }
            else array_push($files_matched, $filepath);
          }
        }
        if ( is_dir($filepath) && ($args & PREG_FIND_RECURSIVE) ) $files_matched = array_merge($files_matched, preg_find($pattern, $filepath, $args));
      }
    closedir($fh);
    }
  return $files_matched;
}

Answer 2

您可以使用add_column和pmap_lgl以及一个辅助函数来获得{@ {1}}的单行代码，类似于@YOLO的基础tidyverse解决方案。

apply

在函数中使用library(tidyverse) df <- tibble(A = c('X123', 'X456','X789', 'M111'), B = c('X456', 'Z000', 'ZZZZ', 'M111'), C = c('K234', 'L888', 'D345', 'M111'), D = c('V333', 'B323', '0999', 'M111')) vec <- c('V333', '0999') check <- function(...) { any(c(...) %in% vec) } add_column(df, row_check = pmap_lgl(df, check)) # A tibble: 4 x 5 A B C D row_check <chr> <chr> <chr> <chr> <lgl> 1 X123 X456 K234 V333 TRUE 2 X456 Z000 L888 B323 FALSE 3 X789 ZZZZ D345 0999 TRUE 4 M111 M111 M111 M111 FALSE的警告是，它将对提供的小标题或数据帧的所有列进行操作。如果您还有其他列，则需要指定函数参数或限制传递给...

的数据

Answer 3

..1，..2表示参数的数量。我们可以将它们与mutate和rowwise函数一起使用来获得所需的结果：

tib %>%
    mutate(lgl = pmap(., ~c(..1, ..2, ..3, ..4) %in% vec)) %>%
    rowwise() %>%
    mutate(lgl = any(unlist(lgl)))

  V1    V2    V3    V4    lgl  
  <chr> <chr> <chr> <chr> <lgl>
1 X123  X456  K234  V333  TRUE 
2 X456  Z000  L888  B323  TRUE 
3 X789  ZZZZ  D345  O999  FALSE
4 M111  M111  M111  M111  FALSE

对pmap的调用使用.作为其第一个参数，这是我们正在使用的函数。然后，我们使用c(..1, ..2, ..3, ..4)为每一列创建值的向量。然后，我们需要使用rowwise计算每一行的最终逻辑值。

我的答案的上一个迭代将为vec = c('M111')返回错误的结果，它现在可以正确执行：

tib %>%
    mutate(lgl = pmap(., ~c(..1, ..2, ..3, ..4) %in% c('M111'))) %>%
    rowwise() %>%
    mutate(lgl = any(unlist(lgl)))

  V1    V2    V3    V4    lgl  
  <chr> <chr> <chr> <chr> <lgl>
1 X123  X456  K234  V333  FALSE
2 X456  Z000  L888  B323  FALSE
3 X789  ZZZZ  D345  O999  FALSE
4 M111  M111  M111  M111  TRUE

Here's a link到该功能的文档中，这可能也很有用。

按行操作以查看是否有任何其他列表中的列

3 个答案: