我有一个包含两列的数据框:
df = data.frame(animals = c("cat; dog; bird", "dog; bird", "bird"), sentences = c("the cat is brown; the dog is barking; the bird is green and blue","the dog is black; the bird is yellow and blue", "the bird is blue"), stringsAsFactors = F)
我需要整个“句子”列中每一行上所有“动物”的出现总数。
例如:“动物”第一行c(“ cat; dog; bird”)= sum_occurrences_sentences_column(cat = 1)+(dog = 2)+(bird = 3)= 6。
结果将是第三列,如下所示:
df <- cbind( sum_accurrences_sentences_column = c("6", "5", "3"), df)
我尝试了以下代码,但它们不起作用。
df[str_split(df$animals, ";") %in% df$sentences, ]
str_count(df$sentences, str_split(df$animals, ";"))
任何帮助将不胜感激:)
答案 0 :(得分:3)
这是一个基本的R
解决方案:
首先用;
删除所有gsub
,然后将句子列和unlist
分成一个向量:
split_sentence_column = unlist(strsplit(gsub(';','',df$sentences),' '))
然后建立一个for循环,并为每行获取动物的向量,用%in%
检查动物列表中的哪些句子列动物,然后对所有TRUE
个案求和。然后,我们可以将其直接分配给新的df列:
for(i in 1:nrow(df)){
animals = unlist(strsplit(df$animals[i], '; '))
df$sum_occurrences_sentences_column[i] = sum(split_sentence_column %in% animals)
}
> df
animals sentences sum_occurrences_sentences_column
1 cat; dog; bird the cat is brown; the dog is barking; the bird is green and blue 6
2 dog; bird the dog is black; the bird is yellow and blue 5
3 bird the bird is blue 3
答案 1 :(得分:1)
一种map()
方式来操纵第一列中的每只动物。
library(tidyverse)
string <- unlist(str_split(df$sentences, ";"))
df %>% rowwise %>%
mutate(SUM = str_split(animals, "; ", simplify = T) %>%
map( ~ str_count(string, .)) %>%
unlist %>% sum)
# animals sentences SUM
# <chr> <chr> <int>
# 1 cat; dog; bird the cat is brown; the dog is barking; the bird... 6
# 2 dog; bird the dog is black; the bird is yellow and blue 5
# 3 bird the bird is blue 3