计算数据框列中的单词出现次数

时间:2018-10-09 00:18:48

标签: r

我有一个庞大的数据框,并且我有兴趣查找特定列中所有单词的出现,例如:

Column
Hi my name is Corey!
Hi my name is John

所需的输出:

Hi 2
my 2
name 2
is 2
Corey 1
John 1

我也想排除特殊的字母,例如!在科里!在这个例子中,也像问号,句号等... 任何帮助将不胜感激,谢谢!

1 个答案:

答案 0 :(得分:1)

df <- data.frame(column = c('Hi my name is Corey!',
  'Hi my name is John'))
df

#column
#1 Hi my name is Corey!
#2   Hi my name is John

all_words <- unlist( # flattten word list from individual strings into one vector
  regmatches(df$column,  gregexpr('\\w+', df$column))) # extract all words
# count frequencies
freq_count <- table(all_words)
freq_count

#Corey    Hi    is  John    my  name 
#1     2     2     1     2     2