计算包含单词的行数

时间:2019-10-22 13:14:44

标签: r text-mining

我有一个包含许多行的数据集,其中包含水果描述,例如:

An apple hangs on an apple tree
Bananas are yellow and tasty 
The apple is tasty

我需要在此说明中找到唯一的词( 我已经做完了),然后我必须计算出这些独特词出现了多少行。 示例:

Apple 2 (rows)
Bananas 1 (rows)
tree 1 (rows)
tasty 2 (rows)

我已经做了类似的事情:

rows <- data_frame %>%
  filter(str_detect(variable, "apple"))
count_rows <- as.data.frame(nrow(rows))

但是问题是我有太多独特的单词,所以我不想手动进行。有什么想法吗?

3 个答案:

答案 0 :(得分:0)

一个dplyrtidyr选项可以是:

df %>%
 rowid_to_column() %>%
 mutate(sentences = strsplit(sentences, " ", fixed = TRUE)) %>%
 unnest(sentences) %>%
 mutate(sentences = tolower(sentences)) %>%
 filter(sentences %in% list_of_words) %>%
 group_by(sentences) %>%
 summarise_all(n_distinct)

  sentences rowid
  <chr>     <int>
1 apple         2
2 bananas       1
3 tasty         2
4 tree          1

样本数据:

df <- data.frame(sentences = c("An apple hangs on an apple tree",
                               "Bananas are yellow and tasty",
                               "The apple is tasty"),
                 stringsAsFactors = FALSE)   

list_of_words <- tolower(c("Apple", "Bananas", "tree", "tasty"))

答案 1 :(得分:0)

在基R中,可以按照以下步骤进行操作。

r <- apply(sapply(words, function(s) grepl(s, df[[1]], ignore.case = TRUE)), 2, sum)
as.data.frame(r)
#        r
#Apple   2
#Bananas 1
#tree    1
#tasty   2

数据。

x <-
"'An apple hangs on an apple tree'
'Bananas are yellow and tasty' 
'The apple is tasty'"

x <- scan(textConnection(x), what = character())
df <- data.frame(x)

words <- c("Apple", "Bananas", "tree", "tasty")

答案 2 :(得分:0)

R的基本解决方案是将greplsapplylapply一起使用:

sapply(list_of_words, function(x) sum(grepl(x, tolower(df$sentences), fixed = T)))
apple bananas    tree   tasty 
    2       1       1       2