在定界符处拆分列,用拆分值填充新列

时间:2019-01-04 19:24:17

标签: r mutate

如何在定界符处拆分一列,并用与每一行内容相对应的值填充这些新列?

我有一列functionality,最初包含最多五个句子的某种组合。我使用mutate()使用以下关键字将这些句子替换为关键字:

mutate(functionality = str_replace(functionality, 
"A long sentence about audits.", 
"audits")) %>% mutate(functionality = str_replace(functionality, 
"A long sentence about patterns.", 
"patterns")) %>% mutate(functionality = str_replace(functionality, 
"A long sentence about monitoring.", 
"monitoring")) %>% mutate(functionality = str_replace(functionality, 
"A long sentence about reviews.", 
"reviews")) %>% mutate(functionality = str_replace(functionality, 
"A long sentence about investigations.", 
"investigations")) %>% as.data.frame()

<sup>Created on 2019-01-04 by the [reprex package](https://reprex.tidyverse.org) (v0.2.1)</sup>

产生以下列:

| functionality                                     |
|---------------------------------------------------|
| monitoring investigations patterns                |
| audits patterns                                   |
| reviews audits monitoring patterns                |
| reviews audits monitoring investigations patterns |

我想将functionality分成单独的列monitoringinvestigationspatternsauditsreviews,并用它们填充对应于原始列的值,例如

| monitoring | investigations | patterns | review | audits |
|------------|----------------|----------|--------|--------|
| 1          | 1              | 1        | 0      | 0      |
| 0          | 0              | 1        | 0      | 1      |
| 1          | 0              | 1        | 1      | 1      |
| 1          | 1              | 1        | 1      | 1      |

我在greplmutate_at上并没有取得多少成功,但是我对R还是比较陌生,所以我可能没有使用正确的代码。

1 个答案:

答案 0 :(得分:0)

我们可以按空格在“功能”列上进行strsplit,然后使用mtabulate获取频率

library(qdapTools)
mtabulate(strsplit(df1$functionality, " +"))

如果列functionalityfactor类(基于末尾的as.data.frame换行-默认为stringsAsFactors = TRUE),则将其更改为character分裂前上课

mtabulate(strsplit(as.character(df1$functionality), " +"))
#   audits investigations monitoring patterns reviews
#1      0              1          1        1       0
#2      1              0          0        1       0
#3      1              0          1        1       1
#4      1              1          1        1       1

或者当OP使用tidyverse时,我们可以使用separate_rows/spread来获得预期的输出

library(tidyverse)
df1 %>% 
  rownames_to_column('rn') %>% 
  separate_rows(functionality) %>%
  count(rn, functionality) %>%
  spread(functionality, n, fill = 0) %>% 
  select(-rn)
# A tibble: 4 x 5
#  audits investigations monitoring patterns reviews
#   <dbl>          <dbl>      <dbl>    <dbl>   <dbl>
#1      0              1          1        1       0
#2      1              0          0        1       0
#3      1              0          1        1       1
#4      1              1          1        1       1

或者,base R选项是将list中的vector转换为两列data.frame,并使用stack并获得table < / p>

table(stack(setNames(strsplit(as.character(df1$functionality), " +"), 
                  row.names(df1)))[2:1])
# values
#ind audits investigations monitoring patterns reviews
#  1      0              1          1        1       0
#  2      1              0          0        1       0
#  3      1              0          1        1       1
#  4      1              1          1        1       1

数据

df1 <- structure(list(functionality = c("monitoring investigations patterns", 
"audits patterns", "reviews audits monitoring patterns", 
"reviews audits monitoring investigations patterns"
)), class = "data.frame", row.names = c(NA, -4L))