Question

如何在定界符处拆分一列，并用与每一行内容相对应的值填充这些新列？

我有一列functionality，最初包含最多五个句子的某种组合。我使用mutate()使用以下关键字将这些句子替换为关键字：

mutate(functionality = str_replace(functionality, 
"A long sentence about audits.", 
"audits")) %>% mutate(functionality = str_replace(functionality, 
"A long sentence about patterns.", 
"patterns")) %>% mutate(functionality = str_replace(functionality, 
"A long sentence about monitoring.", 
"monitoring")) %>% mutate(functionality = str_replace(functionality, 
"A long sentence about reviews.", 
"reviews")) %>% mutate(functionality = str_replace(functionality, 
"A long sentence about investigations.", 
"investigations")) %>% as.data.frame()

<sup>Created on 2019-01-04 by the [reprex package](https://reprex.tidyverse.org) (v0.2.1)</sup>

产生以下列：

| functionality                                     |
|---------------------------------------------------|
| monitoring investigations patterns                |
| audits patterns                                   |
| reviews audits monitoring patterns                |
| reviews audits monitoring investigations patterns |

我想将functionality分成单独的列monitoring，investigations，patterns，audits和reviews，并用它们填充对应于原始列的值，例如

| monitoring | investigations | patterns | review | audits |
|------------|----------------|----------|--------|--------|
| 1          | 1              | 1        | 0      | 0      |
| 0          | 0              | 1        | 0      | 1      |
| 1          | 0              | 1        | 1      | 1      |
| 1          | 1              | 1        | 1      | 1      |

我在grepl或mutate_at上并没有取得多少成功，但是我对R还是比较陌生，所以我可能没有使用正确的代码。

Answer 1

我们可以按空格在“功能”列上进行strsplit，然后使用mtabulate获取频率

library(qdapTools)
mtabulate(strsplit(df1$functionality, " +"))

如果列functionality是factor类（基于末尾的as.data.frame换行-默认为stringsAsFactors = TRUE），则将其更改为character分裂前上课

mtabulate(strsplit(as.character(df1$functionality), " +"))
#   audits investigations monitoring patterns reviews
#1      0              1          1        1       0
#2      1              0          0        1       0
#3      1              0          1        1       1
#4      1              1          1        1       1

或者当OP使用tidyverse时，我们可以使用separate_rows/spread来获得预期的输出

library(tidyverse)
df1 %>% 
  rownames_to_column('rn') %>% 
  separate_rows(functionality) %>%
  count(rn, functionality) %>%
  spread(functionality, n, fill = 0) %>% 
  select(-rn)
# A tibble: 4 x 5
#  audits investigations monitoring patterns reviews
#   <dbl>          <dbl>      <dbl>    <dbl>   <dbl>
#1      0              1          1        1       0
#2      1              0          0        1       0
#3      1              0          1        1       1
#4      1              1          1        1       1

或者，base R选项是将list中的vector转换为两列data.frame，并使用stack并获得table < / p>

table(stack(setNames(strsplit(as.character(df1$functionality), " +"), 
                  row.names(df1)))[2:1])
# values
#ind audits investigations monitoring patterns reviews
#  1      0              1          1        1       0
#  2      1              0          0        1       0
#  3      1              0          1        1       1
#  4      1              1          1        1       1

数据

df1 <- structure(list(functionality = c("monitoring investigations patterns", 
"audits patterns", "reviews audits monitoring patterns", 
"reviews audits monitoring investigations patterns"
)), class = "data.frame", row.names = c(NA, -4L))

在定界符处拆分列，用拆分值填充新列

1 个答案:

数据