我有一个文本文件,想要两组单词的频率计数。例如:
mock-maker-inline
以下列方式要求输出:
setone <- ("mumbai", "delhi", "chennai")
settwo <- ("nike", "zara","puma")
textfile <- ("brands in cites like nike zara and puma in mumbai, delhi and chennai. while many exotic brands in mumbai... disel, durby, Calvin Kline")
请帮忙。
答案 0 :(得分:1)
这是一种方法:
library(tidyverse)
library(stringr)
setone <- c("mumbai", "delhi", "chennai")
settwo <- c("nike", "zara","puma")
textfile <- (
"brands in cites like nike zara and puma in mumbai, delhi and chennai.
while many exotic brands in mumbai... disel, durby, Calvin Kline")
out <- tibble(
textfile = textfile,
setone = str_count(textfile, str_c(setone, collapse = '|')),
settwo = str_count(textfile, str_c(settwo, collapse = '|'))
)
out <- mutate(out, total = setone + settwo)