Question

我一直在整理一些非结构化的文本，并且将短语分为单独的字段，主要是出于可读性目的。产生的数据框看起来像这样（即它包含NA）：

const meController: IQueryResolvers['me'] = () => {};

我现在有兴趣提取包含“ watches”一词的所有值，同时保留数据集的结构。

behaviour <- data.frame(field_1 = c("cat eats meal", "cat watches fish", "cat watches frog", NA),
                        field_2 = c("bird flies away", "cat watches bird", "cat watches fish", NA),
                        field_3 = c("cat is bored", NA, NA, "cat watches sky"))

两个数据集如下：

watches <- behaviour %>%
  map_df(~ str_detect(., "watches"))

behaviour

我想获得一个新的数据帧，其中仅在watches中的对应值是behaviour并且原始数据帧的结构为watches时保留TRUE中的值。保留。我尝试通过behaviour[watches]既转换为矩阵又进行子集转换，但输出是不具有保留结构的字符向量。

我现在想知道这是否涉及到mutate_all上的behaviour，即是否watches == T，然后将原始值粘贴或保留在behaviour中？

提前谢谢！

Answer 1

由于您已经拥有watches，因此可以直接分配NA：

behaviour[!watches] <- NA

无需创建watches，我们可以将mutate_all用作：

library(dplyr)
behaviour %>% mutate_all(~replace(., !stringr::str_detect(., "watches"), NA))

#           field_1          field_2         field_3
#1             <NA>             <NA>            <NA>
#2 cat watches fish cat watches bird            <NA>
#3 cat watches frog cat watches fish            <NA>
#4             <NA>             <NA> cat watches sky

在基数R中，我们可以将lapply与grepl一起使用

behaviour[] <- lapply(behaviour, function(x) replace(x, !grepl('watches', x), NA))
#or with grep
#behaviour[] <- lapply(behaviour, function(x) replace(x, -grep('watches', x), NA))

有没有一种方法可以对逻辑数据帧子集一个数据帧并在R中保留原始数据帧结构？

1 个答案: