Question

我有一个庞大的数据框，并且我有兴趣查找特定列中所有单词的出现，例如：

Column
Hi my name is Corey!
Hi my name is John

所需的输出：

Hi 2
my 2
name 2
is 2
Corey 1
John 1

我也想排除特殊的字母，例如！在科里！在这个例子中，也像问号，句号等... 任何帮助将不胜感激，谢谢！

Answer 1

df <- data.frame(column = c('Hi my name is Corey!',
  'Hi my name is John'))
df

#column
#1 Hi my name is Corey!
#2   Hi my name is John

all_words <- unlist( # flattten word list from individual strings into one vector
  regmatches(df$column,  gregexpr('\\w+', df$column))) # extract all words
# count frequencies
freq_count <- table(all_words)
freq_count

#Corey    Hi    is  John    my  name 
#1     2     2     1     2     2

计算数据框列中的单词出现次数

1 个答案: