如何在定界符处拆分一列,并用与每一行内容相对应的值填充这些新列?
我有一列functionality
,最初包含最多五个句子的某种组合。我使用mutate()
使用以下关键字将这些句子替换为关键字:
mutate(functionality = str_replace(functionality,
"A long sentence about audits.",
"audits")) %>% mutate(functionality = str_replace(functionality,
"A long sentence about patterns.",
"patterns")) %>% mutate(functionality = str_replace(functionality,
"A long sentence about monitoring.",
"monitoring")) %>% mutate(functionality = str_replace(functionality,
"A long sentence about reviews.",
"reviews")) %>% mutate(functionality = str_replace(functionality,
"A long sentence about investigations.",
"investigations")) %>% as.data.frame()
<sup>Created on 2019-01-04 by the [reprex package](https://reprex.tidyverse.org) (v0.2.1)</sup>
产生以下列:
| functionality |
|---------------------------------------------------|
| monitoring investigations patterns |
| audits patterns |
| reviews audits monitoring patterns |
| reviews audits monitoring investigations patterns |
我想将functionality
分成单独的列monitoring
,investigations
,patterns
,audits
和reviews
,并用它们填充对应于原始列的值,例如
| monitoring | investigations | patterns | review | audits |
|------------|----------------|----------|--------|--------|
| 1 | 1 | 1 | 0 | 0 |
| 0 | 0 | 1 | 0 | 1 |
| 1 | 0 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 | 1 |
我在grepl
或mutate_at
上并没有取得多少成功,但是我对R还是比较陌生,所以我可能没有使用正确的代码。
答案 0 :(得分:0)
我们可以按空格在“功能”列上进行strsplit
,然后使用mtabulate
获取频率
library(qdapTools)
mtabulate(strsplit(df1$functionality, " +"))
如果列functionality
是factor
类(基于末尾的as.data.frame
换行-默认为stringsAsFactors = TRUE
),则将其更改为character
分裂前上课
mtabulate(strsplit(as.character(df1$functionality), " +"))
# audits investigations monitoring patterns reviews
#1 0 1 1 1 0
#2 1 0 0 1 0
#3 1 0 1 1 1
#4 1 1 1 1 1
或者当OP使用tidyverse
时,我们可以使用separate_rows/spread
来获得预期的输出
library(tidyverse)
df1 %>%
rownames_to_column('rn') %>%
separate_rows(functionality) %>%
count(rn, functionality) %>%
spread(functionality, n, fill = 0) %>%
select(-rn)
# A tibble: 4 x 5
# audits investigations monitoring patterns reviews
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 0 1 1 1 0
#2 1 0 0 1 0
#3 1 0 1 1 1
#4 1 1 1 1 1
或者,base R
选项是将list
中的vector
转换为两列data.frame,并使用stack
并获得table
< / p>
table(stack(setNames(strsplit(as.character(df1$functionality), " +"),
row.names(df1)))[2:1])
# values
#ind audits investigations monitoring patterns reviews
# 1 0 1 1 1 0
# 2 1 0 0 1 0
# 3 1 0 1 1 1
# 4 1 1 1 1 1
df1 <- structure(list(functionality = c("monitoring investigations patterns",
"audits patterns", "reviews audits monitoring patterns",
"reviews audits monitoring investigations patterns"
)), class = "data.frame", row.names = c(NA, -4L))