我有一个包含许多行的数据集,其中包含水果描述,例如:
An apple hangs on an apple tree
Bananas are yellow and tasty
The apple is tasty
我需要在此说明中找到唯一的词( 我已经做完了),然后我必须计算出这些独特词出现了多少行。 示例:
Apple 2 (rows)
Bananas 1 (rows)
tree 1 (rows)
tasty 2 (rows)
我已经做了类似的事情:
rows <- data_frame %>%
filter(str_detect(variable, "apple"))
count_rows <- as.data.frame(nrow(rows))
但是问题是我有太多独特的单词,所以我不想手动进行。有什么想法吗?
答案 0 :(得分:0)
一个dplyr
和tidyr
选项可以是:
df %>%
rowid_to_column() %>%
mutate(sentences = strsplit(sentences, " ", fixed = TRUE)) %>%
unnest(sentences) %>%
mutate(sentences = tolower(sentences)) %>%
filter(sentences %in% list_of_words) %>%
group_by(sentences) %>%
summarise_all(n_distinct)
sentences rowid
<chr> <int>
1 apple 2
2 bananas 1
3 tasty 2
4 tree 1
样本数据:
df <- data.frame(sentences = c("An apple hangs on an apple tree",
"Bananas are yellow and tasty",
"The apple is tasty"),
stringsAsFactors = FALSE)
list_of_words <- tolower(c("Apple", "Bananas", "tree", "tasty"))
答案 1 :(得分:0)
在基R中,可以按照以下步骤进行操作。
r <- apply(sapply(words, function(s) grepl(s, df[[1]], ignore.case = TRUE)), 2, sum)
as.data.frame(r)
# r
#Apple 2
#Bananas 1
#tree 1
#tasty 2
数据。
x <-
"'An apple hangs on an apple tree'
'Bananas are yellow and tasty'
'The apple is tasty'"
x <- scan(textConnection(x), what = character())
df <- data.frame(x)
words <- c("Apple", "Bananas", "tree", "tasty")
答案 2 :(得分:0)
R的基本解决方案是将grepl
与sapply
或lapply
一起使用:
sapply(list_of_words, function(x) sum(grepl(x, tolower(df$sentences), fixed = T)))
apple bananas tree tasty
2 1 1 2