我有一个像这样的大数据框:
df
id product
1 milk
2 200
3 gr.
4 Low
5 fat
6 milkshake
7 200
8 gr.
9 High
10 fat
...
对于每个单词,我需要了解哪些单词与之相似,我使用grepl,我可以对每个单词单独执行此操作,但是我不知道如何将其应用于整个数据框架。
matches1<-paste(grepl(words_unlist[1],words_unlist))
matches1<- as.data.frame(matches1)
id matches1
1 1
2 0
3 0
4 0
5 0
6 1
7 0
8 0
9 0
10 0
但是我需要对所有单词都做同样的事情。像这样:
df
id product matches1 matches2 matches3 ... matches10
1 milk 1 0 0 ... 0
2 200 0 1 0 ... 0
3 gr. 0 0 1 ... 0
4 Low 0 0 0 ... 0
5 fat 0 0 0 ... 1
6 milkshake 1 0 0 ... 0
7 200 0 1 0 ... 0
8 gr. 0 0 1 ... 0
9 High 0 0 0 ... 0
10 fat 0 0 0 ... 1
...
答案 0 :(得分:0)
我们可以使用sapply
并将每个product
与整个列df$Product
与grepl
匹配
df[paste0("matches", seq_len(nrow(df)))] <- +(sapply(df$product, grepl, df$product))
df
# id product matches1 matches2 matches3 matches4 matches5 matches6 matches7 matches8 matches9 matches10
#1 1 milk 1 0 0 0 0 0 0 0 0 0
#2 2 200 0 1 0 0 0 0 1 0 0 0
#3 3 gr. 0 0 1 0 0 0 0 1 0 0
#4 4 Low 0 0 0 1 0 0 0 0 0 0
#5 5 fat 0 0 0 0 1 0 0 0 0 1
#6 6 milkshake 1 0 0 0 0 1 0 0 0 0
#7 7 200 0 1 0 0 0 0 1 0 0 0
#8 8 gr. 0 0 1 0 0 0 0 1 0 0
#9 9 High 0 0 0 0 0 0 0 0 1 0
#10 10 fat 0 0 0 0 1 0 0 0 0 1
答案 1 :(得分:0)
带有lapply
df[paste0("matches", seq_len(nrow(df)))] <- +(do.call(cbind,
lapply(df$product, grepl, df$product)))
df
# id product matches1 matches2 matches3 matches4 matches5 matches6 matches7 matches8 matches9 matches10
#1 1 milk 1 0 0 0 0 0 0 0 0 0
#2 2 200 0 1 0 0 0 0 1 0 0 0
#3 3 gr. 0 0 1 0 0 0 0 1 0 0
#4 4 Low 0 0 0 1 0 0 0 0 0 0
#5 5 fat 0 0 0 0 1 0 0 0 0 1
#6 6 milkshake 1 0 0 0 0 1 0 0 0 0
#7 7 200 0 1 0 0 0 0 1 0 0 0
#8 8 gr. 0 0 1 0 0 0 0 1 0 0
#9 9 High 0 0 0 0 0 0 0 0 1 0
#10 10 fat 0 0 0 0 1 0 0 0 0 1
或使用tidyverse
library(tidyverse)
df %>%
mutate(similar = map(product, ~
str_detect(.x, df$product) %>%
as.integer %>%
as.list %>%
set_names(str_c('matches', seq_len(nrow(df)))) %>%
as_tibble )) %>%
unnest
df <- structure(list(id = 1:10, product = c("milk", "200", "gr.", "Low",
"fat", "milkshake", "200", "gr.", "High", "fat")),
class = "data.frame", row.names = c(NA, -10L))